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Preface 



This volume constitutes the proceedings of the 17th International Conference 
on Theorem Proving in Higher Order Logics (TPHOLs 2004) held September 
14-17, 2004 in Park City, Utah, USA. TPHOLs covers all aspects of theorem 
proving in higher-order logics as well as related topics in theorem proving and 
verification. 

There were 42 papers submitted to TPHOLs 2004 in the full research cate- 
gory, each of which was refereed by at least 3 reviewers selected by the program 
committee. Of these submissions, 21 were accepted for presentation at the con- 
ference and publication in this volume. In keeping with longstanding tradition, 
TPHOLs 2004 also offered a venue for the presentation of work in progress, 
where researchers invited discussion by means of a brief introductory talk and 
then discussed their work at a poster session. A supplementary proceedings con- 
taining papers about in-progress work was published as a 2004 technical report 
of the School of Computing at the University of Utah. 

The organizers are grateful to A1 Davis, Thomas Hales, and Ken McMillan 
for agreeing to give invited talks at TPHOLs 2004. 

The TPHOLs conference traditionally changes continents each year in order 
to maximize the chances that researchers from around the world can attend. 
Starting in 1993, the proceedings of TPHOLs and its predecessor workshops 
have been published in the Springer Lecture Notes in Computer Science series: 
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Error Analysis of Digital Filters 
Using Theorem Proving 



Behzad Akbarpour and Sofiene Tahar 

Dept, of Electrical & Computer Engineering, Concordia University 
1455 de Maisonneuve W., Montreal, Quebec, H3G IMS, Canada 
{behzad, taliar}@ece . concordia. ca 



Abstract. When a digital filter is realized with floating-point or fixed- 
point arithmetics, errors and constraints due to finite word length are 
unavoidable. In this paper, we show how these errors can be mechanically 
analysed using the HOL theorem prover. We first model the ideal real 
filter specification and the corresponding floating-point and fixed-point 
implementations as predicates in higher-order logic. We use valuation 
functions to find the real values of the floating-point and fixed-point fil- 
ter outputs and define the error as the difference between these values 
and the corresponding output of the ideal real specification. Fundamen- 
tal analysis lemmas have been established to derive expressions for the 
accumulation of roundoff error in parametric ith-order digital filters, 
for each of the three canonical forms of realization: direct, parallel, and 
cascade. The HOL formalization and proofs are found to be in a good 
agreement with existing theoretical paper-and-pencil counterparts. 



1 Introduction 

Signal processing through digital techniques has become increasingly attractive 
with the rapid technological advancement in digital integrated circuits, devices, 
and systems. The availability of large scale general purpose computers and spe- 
cial purpose hardware has made real time digital filtering both practical and 
economical. Digital filters are a particularly important class of DSP (Digital 
Signal Processing) systems. A digital filter is a discrete time system that trans- 
forms a sequence of input numbers into another sequence of output, by means of 
a computational algorithm [13]. Digital filters are used in a wide variety of sig- 
nal processing applications, such as spectrum analysis, digital image and speech 
processing, and pattern recognition. Due to their well-known advantages, digital 
filters are often replacing classical analog filters. The three distinct and most 
outstanding advantages of the digital filters are their flexibility, reliability, and 
modularity. Excellent methods have been developed to design these filters with 
desired characteristics. The design of a filter is the process of determination of 
a transfer function from a set of specifications given either in the frequency do- 
main, or in the time domain, or for some applications, in both. The design of a 
digital filter starts from an ideal real specification. In a theoretical analysis of 
the digital filters, we generally assume that signal values and system coefficients 
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are represented in the real number system and are expressed to an infinite preci- 
sion. When implemented as a special-purpose digital hardware or as a computer 
algorithm, we must represent the signals and coefficients in some digital number 
system that must always be of a finite precision. Therefore, arithmetic operations 
must be carried out with an accuracy limited by this finite word length. There 
is a variety of types of arithmetic used in the implementation of digital sys- 
tems. Among the most common are the floating-point and fixed-point. Here, all 
operands are represented by a special format or assigned a fixed word length and 
a fixed exponent, while the control structure and the operations of the ideal pro- 
gram remain unchanged. The transformation from the real to the floating-point 
and fixed-point forms is quite tedious and error-prone. On the implementation 
side, the fixed-point model of the algorithm has to be transformed into the best 
suited target description, either using a hardware description or a programming 
language. This design process can be aided by a number of specialized CAD 
tools such as SPW (Cadence) [3], CoCentric (Synopsys) [20], Matlab-Simulink 
(Mathworks) [16], and FRIDGE (Aachen UT) [22]. 



Embedding 
REAL ► 



(Convert) 



REAL ^ 
(HOL) 



FP Error 
Analysis 



Embedding 
FP ► 



(Convert) 

FXP 



Embedding 



Valb^tion 

FP N' , FP Real Value 

(HOL) (HOL) 

FXP Error \ A 
Analysis y i FP to FXP Error 
\ ' Analysis 
Valuation ' ' 

FXP ^ FXP Real Value 

(HOL) (HOL) 



Fig. 1. Error analysis approach 



In this paper we describe the error analysis of digital filters using the HOL 
theorem proving environment [5] based on the commutating diagram shown in 
Figure 1. Thereafter, we first model the ideal real filter specification and the 
corresponding floating-point and fixed-point implementations as predicates in 
higher-order logic. For this, we make use of existing theories in HOL on the 
construction of real numbers [7], the formalization of IEEE-754 standard based 
floating-point arithmetic [8,9], and the formalization of fixed-point arithmetic 
[1, 2]. We use valuation functions to find the real values of the floating-point and 
fixed-point filter outputs and define the errors as the differences between these 
values and the corresponding output of the ideal real specification. Then we es- 
tablish fundamental lemmas on the error analysis of the floating-point and fixed- 
point roundings and arithmetic operations against their abstract mathematical 
counterparts. Finally, we use these lemmas as a model to derive expressions for 
the accumulation of the roundoff error in parametric Lth-order digital filters, for 
each of the three canonical forms of realization: direct, parallel, and cascade [18]. 



Error Analysis of Digital Filters Using Theorem Proving 



3 



Using these forms, our verification methodology can be scaled up to any larger- 
order filter, either directly or by decomposing the design into a combination of 
internal sub-blocks. While the theoretical work on computing the errors due to 
finite precision effects has been extensively studied since the late sixties [15], it 
is for the first time in this paper, that a formalization and proof of this analysis 
for digital filters is done using a mechanical theorem prover, here the HOL. Our 
results are found to be in a good agreement with the theoretical ones. 

The rest of this paper is organized as follows: Section 2 gives a review of 
the related work. Section 3 introduces the fundamental lemmas in HOL for the 
error analysis of the floating-point and fixed-point rounding and arithmetic op- 
erations. Section 4 describes the details of the error analysis in HOL of the class 
of linear difference equation digital filters implemented in the three canonical 
forms of realization. Finally, Section 5 concludes the paper. 



2 Related Work 

Work on the analysis of the errors due to the finite precision effects in the re- 
alization of the digital filters has always existed since their early days, however, 
using theoretical paper-and-pencil proofs and simulation techniques. For digital 
filters realized with the fixed-point arithmetic, error problems have been stud- 
ied extensively. For instance, Knowles and Edwards [14] proposed a method for 
analysis of the finite word length effects in fixed-point digital filters. Gold and 
Radar [6] carried out a detailed analysis of the roundoff error for the first-order 
and second-order fixed-point filters. Jackson [12] analyzed the roundoff noise for 
the cascade and parallel realizations of the fixed-point digital filters. While the 
roundoff noise for the fixed-point arithmetic enters into the system additively, it 
is a multiplicative component in the case of the floating-point arithmetic. This 
problem is analyzed first by Sandberg [19], who discussed the roundoff error 
accumulation and input quantization effects in the direct realization of the filter 
excited by a deterministic input. He also derived a bound on the time average 
of the squared error at the output. Liu and Kaneko [15] presented a general 
approach to the error analysis problem of digital filters using the floating-point 
arithmetic and calculated the error at the output due to the roundoff accumula- 
tion and input quantization. Expressions are derived for the mean square error 
for each of the three canonical forms of realization: direct, cascade, and par- 
allel. Upper bounds that are useful for a special class of the filters are given. 
Oppenheim and Weinstein [17] discussed in some details the effects of the finite 
register length on implementations of the linear recursive difference equation 
digital filters, and the fast Fourier transform (FFT) algorithm. Comparisons of 
the roundoff noise in the digital filters using the different types of arithmetics 
have also been reported in [21]. 

In order to validate the error analysis, most of the above work compare the 
theoretical results with corresponding experimental simulations. In this paper, 
we show how the above error analysis can be mechanically performed using the 
HOL theorem prover, providing a superior approach to validation by simulation. 
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Our focus will be on the process of translating the hand proofs into equivalent 
proofs in HOL. The analysis we propose is mostly inspired by the work done 
by Liu and Kaneko [15], who defined a general approach to the error analysis 
problem of digital filters using the floating-point arithmetic. Following a simi- 
lar approach, we have extended this theoretical analysis for fixed-point digital 
filters. In both cases, a good agreement between the HOL formalized and the 
theoretical results are obtained. 

Through our work, we confirmed and strengthened the main results of the 
previously published theoretical error analysis, though we uncovered some minor 
errors in the hand proofs and located a few subtle corners that are overlooked 
informally. For example, in the theoretical fixed-point error analysis it is always 
assumed that the fixed-point addition causes no error and only the roundoff 
error in the fixed-point multiplication is analyzed [17]. This is under the as- 
sumption that there is no overflow in the result and also the input operands 
have the same attributes as the output. Using a mechanical theorem prover, 
we provide a more general error analysis in which we cover the roundoff errors 
in both the fixed-point addition and multiplication operations. On top of that, 
for the floating-point error analysis, we have used the formalization in HOL of 
the IEEE-754 [8], a standard which has not yet been established at the time of 
the above mentioned theoretical error analysis. This enabled us to cover a more 
complete set of rounding and overflow modes and degenerate cases which are 
not discussed in earlier theoretical work. 

Previous work on the error analysis in formal verification was done by Harri- 
son [9] who verified the floating-point algorithms such as the exponential function 
against their abstract mathematical counterparts using the HOL Light theorem 
prover. As the main theorem, he proved that the floating-point exponential func- 
tion has a correct overflow behavior, and in the absence of overflow the error 
in the result is bounded to a certain amount. He also reported on an error in 
the hand proof mostly related to forgetting some special cases in the analysis. 
This error analysis is very similar to the type of analysis performed for DSP 
algorithms. The major difference, however, is the use of statistical methods and 
mean square error analysis for DSP algorithms which is not covered in the error 
analysis of the mathematical functions used by Harrison. In this method, the er- 
ror quantities are treated as independent random variables uniformly distributed 
over a specific interval depending on the type of arithmetic and the rounding 
mode. Then the error analysis is performed to derive expressions for the vari- 
ance and mean square error. To perform such an analysis in HOL, we need to 
develop a mechanized theory on the properties of random variables and random 
processes. This type of analysis is not addressed in this paper and is a part of our 
work in progress. Huhn et al. [11] proposed a hybrid formal verification method 
combining different state-of-the-art techniques to guide the complete design flow 
of imprecisely working arithmetic circuits starting at the algorithmic down to 
the register transfer level. The usefulness of the method is illustrated with the 
example of the discrete cosine transform algorithms. In particular, the authors 
have shown the use of computer algebra systems like Mathematica or Maple 
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at the algorithmic level to reason about real numbers and to determine certain 
error bounds for the results of numerical operations. In contrast to [11], we pro- 
pose an error analysis for digital filters using the HOL theorem prover. Although 
the computer algebraic systems such as Maple or Mathematica are much more 
popular and have many powerful decision procedures and heuristics, theorem 
provers are more expressive, more precise, and more reliable [10]. One option 
is to combine the rigour of the theorem provers with the power of computer 
algebraic systems as proposed in [10]. 

3 Error Analysis Models 

In this section we introduce the fundamental error analysis theorems [23, 4], and 
the corresponding lemmas in HOL for the floating-point [8, 9] and fixed-point [1, 
2] arithmetics. These theorems are then used in the next sections as a model for 
the analysis of the roundoff error in digital filters. 

3.1 Floating-Point Error Model 

In analyzing the effects of floating-point roundoff, the effects of rounding will be 
represented multiplicatively. The following theorem is the most fundamental in 
the floating-point rounding-error theory [23,4]. 

Theorem 1: If the real number x located within the floating-point range, is 
rounded to the closest floating-point number xr, then 

= a:(l -I- 5), where |<5| < 2“P (1) 

and p is the precision of the floating-point format. 

In HOL, we proved this theorem in the IEEE single precision floating-point 
format for the case of rounding to nearest as follows: 

Lemma 1: FLOAT_RDUND_RELATIVE_ERROR 

h normalizes x => 3 e. abs (e) < (1 / 2 pow ((fracwidth X) + 1)) A 
(Val (float (round X To_nearest x)) = x * (1 + e)) 

where the function normalizes defines the criteria for an arbitrary real number to 
be in the normalized range of floating-point numbers [8] , fracwidth extracts the 
fraction width parameter from the floating-point format X, Val is the floating- 
point valuation function, float is the bijection function that converts a triple 
of natural numbers into the floating-point type, and round is the floating-point 
rounding function [9]. 

To prove this theorem [4] , we first proved the following lemma which locates 
a real number in a binade (the floating-point numbers between two adjacent 
powers of 2): 

Lemma 2: REAL_IN_BINADE 

h normalizes x => 3 j . j < ((emax X) — 2) A 
(2 pow (j + 1) / 2 pow (bias X)) < abs x A 
abs X < (2 pow (j + 2) / 2 pow (bias X)) 
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where the function emax defines the maximum exponent in a given floating- 
point format, and bias defines the exponent bias in the floating-point format 
which is a constant used to make the exponent’s range nonnegative. Using this 
lemma we can rewrite the general floating-point absolute error bound theorem 
(ERROR_BOUND_NDRM_STRONG) developed in [9] as follows: 

Lemma 3: ERRDR_BQUND_NDRM_STRONG_NDRMALIZE 
h normalizes x => 

3 j. abs (error x) < (2 pow j / 2 pow (bias X + fracwidth X)) 

which states that if the absolute value of a real number is in the representable 
range of the normalized floating-point numbers, then the absolute value of the 
error is less than or equal to ^ x) ^ function error, de- 

fines the error resulting from rounding a real number to a floating-point value 
which is defined as follows [9]: 

\~def error x = (Val (float (round X To_nearest x) ) — x) 

Since (2(^'+i) / ^)) < | x\ for the real numbers in the normalized region as 

proved in Lemma 2, we have (jerror x\ j |a;|) < (V j 2(*'“® ^ + fi'acwidth x)j j 
(2(i+i) / or (lerror x\ / |a;|) < (1 / Finally, 

defining e = (error x j x) will complete the proof of the floating-point relative 
error bound theorem as described in Lemma 1. 

Next, we apply the floating-point relative rounding error analysis theorem 
(Theorem 1) to the verification of the arithmetic operations. The goal is to 
prove the following theorem in which floating-point arithmetic operations such 
as addition, subtraction, multiplication, and division are related to their abstract 
mathematical counterparts according to the corresponding errors. 

Theorem 2: Let * denote any of the floating-point operations 3-, -, x , /. Then 

fl (x * y) = (x * y)(l + 6), where |i5| < 2~p (2) 

and p is the precision of the floating-point format. The notation fl (.) is used to 
denote that the operation is performed using the floating-point arithmetic. 

To prove this theorem in HOL, we start from the already proved lemmas on 
the absolute analysis of rounding error in the floating-point arithmetic operations 
(FL0AT_ADD) developed in [9]. We have converted these lemmas to the following 
relative error analysis version, using the relative error bound analysis of floating- 
point rounding (Lemma 1): 

Lemma 4: FLOAT_ADD_RELATIVE 

h Finite a A Finite b A normalizes (Val a + Val b) 

Finite (a + b) A 3 e. abs e < (1 / 2 pow ((fracwidth X) + 1)) 

A (Val (a + b) = (Val a + Val b) * (1 + e)) 

where the function Finite defines the finiteness criteria for the floating-point 
numbers. Note that we use the conventional symbols for arithmetic operations 
on floating-point numbers using the operator overloading in HOL. 
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3.2 Fixed-Point Error Model 

While the rounding error for the floating-point arithmetic enters into the sys- 
tem multiplicatively, it is an additive component for the fixed-point arithmetic. 
In this case the fundamental error analysis theorem can be stated as follows [23] . 

Theorem 3: If the real number x located in the range of the fixed-point num- 
bers with format X’, is rounded to the closest fixed-point number x'j^, then 

x'^ = X + e, where jej < (3) 

and fracbits is a function that extracts the number of bits that are to the right 
of the binary point in the given fixed-point format. 

This theorem is proved in HOL as follows [1]: 

Lemma 5: FXP_ROUND_ABSOLUTE_ERROR_BOUND 
h (validAttr XO A (representable X^ x) => 

abs (Fxp_error X^ x) < (1 / 2 pow (fracbits XO ) 

where the function validAttr defines the validity of the fixed-point format, rep- 
resentable defines the criteria for a real number to be in the representable range 
of the fixed-point format, and Fxp-error defines the fixed-point rounding error. 

The verification of the fixed-point arithmetic operations using the absolute 
error analysis of the fixed-point rounding (Theorem 3) can be stated as in the 
following theorem in which the fixed-point arithmetic operations are related to 
their abstract mathematical counterparts according to the corresponding errors. 

Theorem 4: Let * denote any of the fixed-point operations -b, -, x , /, with a 
given format X’. Then 

fxp {x * y) = {x * y) e, where jej < (4) 

and the notation fxp (.) is used to denote that the operation is performed using 
the fixed-point arithmetic. This theorem is proved in HOL using the following 
lemma [1]: 

Lemma 6: FXP_ADD_ABSDLUTE 

h (Isvalid a) A (Isvalid b) A validAttr (XO A 

representable J.' (value a + value b) => (Isvalid (FxpAdd X^ a b) ) A 

3 e. abs e < (1 / 2 pow (fracbits XO) A 

value (FxpAdd X' a b) = (value a + value b) + e 

where Isvalid defines the validity of a fixed-point number, value is the fixed-point 
valuation, and FxpAdd is the fixed-point addition. 

4 Error Analysis of Digital Filters in HOL 

In this section, the principal results for the roundoff accumulation in digital 
Alters using the mechanized theorem proving are derived and summarized. We 
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shall employ the models for the floating-point and fixed-point roundoff errors in 
HOL presented in the previous section. In the following, we will first describe 
in details the theory behind the analysis and then explain how this analysis is 
performed in HOL. 

The class of digital Alters considered in this paper is that of linear constant 
coefficient Alters specified by the difference equation: 

M L 

Wn = ^ Xn-t - X] 
z— 0 i—1 

where {xn} is the input sequence and {wn\ is the output sequence. L is the 
order of the Alter, and M can be any positive number less than L. There are 
three canonical forms of realizing a digital Alter, namely the direct, parallel, and 
cascade forms (Figure 2) [18]. 




a) Direct form 




b) Parallel form 



tfl 


















* 





c) Cascade form 



Fig. 2. Canonical forms of digital filter realizations 
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If the output sequence is calculated by using the equation (5), the digital 
filter is said to be realized in the direct form. Figure 2 (a) illustrates the direct 
form realization of the filter using the corresponding blocks for the addition, 
multiplication by a constant operations, and the delay element. 

The implementation of a digital filter in the parallel form is shown in Fig- 
ure 2 (b) in which the entire filter is visualized as the parallel connection of 
the simpler filters Hi of a lower order. In this case, K intermediate outputs 
{wjj}, i = 1,2,. . . ,K are first calculated and then summed to form the total 
output {wn}. Therefore, for the input sequence {x„} we have: 

Wn = ftXn + QiXn-l ~ (6) 

where the parameters fi,gi,Ci, and di are obtained from the parameters Qi and 
bi in equation (5) using the parallel expansion. The output of the entire filter 
Wn, is then related to by: 

Wn = Wn+Wn~\ h (7) 



The implementation of a digital filter in the cascade form is shown in Fig- 
ure 2(c) in which the filter is visualized as a cascade of lower filters. From the 
input {a;n}) the intermediate output is first calculated, and then this is the 
input to the second filter. Continuing in this manner, the final output Wn = Wn 
is calculated. Since the output of the zth section (w^) is the input of the (i+1 jth 
section, the following equation holds: 



= wl 



kii 



n—1 



I 7 i i+1 j i+1 

+ hWn-2 - CiWn-i ~ diWn-2 



(8) 



where the parameters ki,U,Ci, and di are obtained from the parameters ai and 
bi in equation (5) using the serial expansion. 

There are three common sources of errors associated with the filter of the 
equation (5), namely [15]: 

1. input quantization: caused by the quantization of the input signal {a;„} 
into a set of discrete levels. 

2. coefficient inaccuracy: caused by the representation of the filter coeffi- 
cients {ofc} and {bk} by a finite word length. 

3. round-off accumulation: caused by the accumulation of roundoff errors 
at arithmetic operations. 



Therefore, for the digital filter of the equation (5) the actual computed output 
reference is in general different from {w„}. We denote the actual floating-point 
and fixed-point outputs by {jjn} and {v„}, respectively. Then, we define the 
corresponding errors at the nth output sample as: 



— Un '^n 

= I'n - Wn 
^n ~ Vn 



(9) 

(10) 

( 11 ) 
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where Cn and are defined as the errors between the actual fioating-point and 
fixed-point implementations and the ideal real specification, respectively, e" is 
the error in the transition from the fioating-point to fixed-point levels. 

It is clear from the above discussion that for the digital filter of the equation 
(5) realized in the direct form, we have: 

M L 

Un — ^ ^n—k ^ ^ Un — k) 

fc=0 fc=l 

and 

M L 

Vn = fxp C^bk Xn-k - '^akVn-k) (13) 

k=Q k=l 

The notations fl (.) and fxp (.) are used to denote that the operations are 
performed using the fioating-point and fixed-point arithmetics, respectively. The 
calculation is to be performed in the following manner. First, the output products 
flfe Un-k, k = 1,2,. . . ,L are calculated separately and then summed. Next, the 
same is done for the input products bk Xn-k, k = 0,1,. .. ,M. Finally, the output 
summation is subtracted from the input one to obtain the main fioating-point 
output Un . Similar discussion can be applied for the calculation of the fixed-point 
output Vn. The corresponding fiowgraph showing the effect of roundoff error 
using the fundamental error analysis theorems (Theorems 2 and 4) according to 
the equations (2) and (4), is given by Figure 3, which also indicates the order of 
the calculation. 

Formally, a fiowgraph is a network of directed branches that connect at nodes. 
Associated with each node is a variable or node value. Each branch has an input 





aiVn-l 



a.2Vn-2 

(a2«n-2) 

a-aVn-a 

(a3iJ„_3) 







olVu-l 

(ai,Vr,-L) 



Fig. 3. Error fiowgraph for ith-order filter (Direct form) 
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signal and an output signal with a direction indicated by an arrowhead on it. In 
a linear flowgraph, the output of a branch is a linear transformation of the input 
to the branch. The simplest examples are constant multipliers and adders, i.e., 
when the output of the branch is simply a multiplication or an addition of the 
input to the branch with a constant value, which are the only classes we con- 
sider in this paper. The linear operation represented by the branch is typically 
indicated next to the arrowhead showing the direction of the branch. For the 
case of a constant multiplier and adder, the constant is simply shown next to 
the arrowhead. When an explicit indication of the branch operation is omitted, 
this indicates a branch transmittance of unity, or identity transformation. By 
definition, the value at each node in a flowgraph is the sum of the outputs of 
all the branches entering the node. To complete the definition of the flowgraph 
notation, we define two special types of nodes. (1) Source nodes that have no 
entering branches. They are used to represent the injection of the external in- 
puts or signal sources into a flowgraph. (2) Sink nodes that have only entering 
branches. They are used to extract the outputs from a flowgraph [18]. 

The quantities 5n,k, k = 0,1,. . . ,M, e„,k, k = 1,2,. . . ,L, (^n,k, k = 1,2,. . . ,M, 
Vn,k, k = 2,3,. .. ,L, and in Figure 3 are errors caused by the floating-point 
roundoff at each arithmetic step. The corresponding error quantities for the fixed- 
point roundoff (shown in parentheses) are j,, A: = 0,1,. . . ,M, e'^k^k = 1,2,. . . , 
Cnk^^ — o'nkT^ — Note that we have used one 

flowgraph to represent both the floating-point and fixed-point cases, simultane- 
ously. For floating-point errors, the branch operations are interpreted as constant 
multiplications, while for fixed-point errors the branch operations are interpreted 
as constant additions. We have surrounded the fixed-point error quantities and 
output samples by parentheses to distinguish them from their floating-point 
counterparts. Therefore, the actual outputs and are seen to be given ex- 
plicitly by: 

M L 

Vn — ^ ^ dk On,k ^n—k ^ ^ ^k 4^n,k Vn—k 

fc=0 k=l 

where 



M 

dn,0 = (1 + Cn)(l + '^n,o) + Cn,i) 

i=l 

M 

= (1 + Cn)(l + J]^(l + Cn,i)) where j = 1,2, ...,M 

i^j 

L 

= (1 + Cn)(l + ^n,l) + Vn,i) 

i=^2 

L 

= (l + Cn)(l + enj) where j = 2,3,. . . ,L 
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and 



M L M M L L 

^ — ^ Ofc Wn-fe + ^ + ^ C,fc + ^ + ^ (15) 

fe=0 fc=l fc=0 fc=l fc=l fc=2 

For the error analysis, we need to calculate the and Vn sequences from 
the equations (14) and (15), and compare them with the ideal output sequence 
Wn specified by the equation (5) to obtain the corresponding errors e„, e'„, and 
e", according to the equations (9), (10), and (11), respectively. Therefore, the 
difference equations for the errors between the different levels showing the accu- 
mulation of the roundoff error are derived as the following error analysis cases: 

1. Real to Floating-Point Error Analysis: 

L M L 

On “t” ^ ^n—k — ^ {^n,k 1) '^n—k ^ ( ^k {4^n,k 1) Vn—k (15) 

fe=l k=0 k=l 

2. Real to Fixed-Point Error Analysis: 

L M M L L 

^ Ofe ^ S'^ f. -\- ^ ^ ^'n,k + ^ Vn,k + Cn (l^) 

fc=l fc=0 fe=l k=l k=2 

3. Floating-Point to Fixed-Point Error Analysis: 

L M M L L 

^ Ofc e"_fc = ^ (5(j -I- ^ Cn,k + X! + X! (^^) 

fc=l fc=0 fc=l fc=l fc=2 

M L 

^ ^ ^k {dn,k 1) ^n—k T ^ ^ O-fc {^n,k 1) Un—k 

k=0 k=l 

Similar analysis is performed for the parallel and cascade forms of realization 
based on the error flowgraphs as shown in Figures 4 and 5, respectively. 

In HOL, we first specified a parametric Lth-order digital filters at the real, 
floating-point, and fixed-point abstraction levels, as predicates in higher-order 
logic. The direct form is defined in HOL using the equation (5). For the real spec- 
ification, we used the expression sum (m,n) /denoting ^ /(*)? which is a 

function available in the HOL real library [7] and defines the finite summation 
on the real numbers. For the floating-point and fixed-point specifications, we de- 
fined similar functions for the finite summations on the floating-point (floatsum) 
and fixed-point (fxp_sum) numbers, using the recursive definition in HOL. For 
the parallel form, we first specified the zth parallel path using the equation (6). 
Then, we specified the entire output as defined in equation (7), using the finite 
summation functions. Finally, we specified the cascade form of realization as de- 
fined in equation (8), using recursive definitions in HOL. For the error analysis 
of the digital filters in HOL, we first established lemmas to compute the output 
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b) Final parallel output 

Fig. 4. Error flowgraph for Tth-order filter (Parallel form) 



hyi-2 




Fig. 5. Error flowgraph for ith-order filter (Cascade form) 



real values of the floating-point and fixed-point Alters according to the equations 
(14) and (15), for the direct form of realization. For this, we need to define the 
finite product on the real numbers. We defined this function in HOL recursively 
as the expression mul (m,n) /denoting i). Finally, we defined the er- 

rors as the differences between the output of the real Alter specification, and the 
corresponding real values of the floating-point and fixed-point Alter implementa- 
tions {ReaLTo-Float-Error,ReaLTo-Fxp-Error), and the error in transition from 
the floating-point to fixed-point levels (FloaETo-Fxp-Error), according to the 
equations (9), (10), and (11), respectively. Then, we established lemmas for the 
accumulation of the round-off error between the different levels, according to 
the equations (16), (17), and (18). Finally, we proved these lemmas using the 
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fundamental floating-point and fixed-point error analysis lemmas, based on the 
error models presented in Section 3. The lemmas are proved by induction on 
the parameters L and M for the direct form of realization. Similar analysis is 
performed in HOL for the parallel and cascade realization forms. For these cases, 
we proved the corresponding lemmas by induction on the parameter K which is 
defined as the number of the internal sub-filters connected in parallel or cascade 
forms to generate the final output. The corresponding error analysis lemmas in 
HOL for the direct form of realization are listed in Appendix A. 

5 Conclusions 

In this paper, we describe a comprehensive methodology for the error analysis 
of generic digital Alters using the HOL theorem prover. The proposed approach 
covers the three canonical forms (direct, parallel and cascade) of realization en- 
tirely specified in HOL. We make use of existing theories in HOL on real, IEEE 
standard based floating-point, and fixed-point arithmetic to model the ideal Al- 
ter specification and the corresponding implementations in higher-order logic. 
We used valuation functions to define the errors as the differences between the 
real values of the floating-point and fixed-point Alter implementation outputs 
and the corresponding output of the ideal real Alter specification. Finally, we 
established fundamental analysis lemmas as our model to derive expressions for 
the accumulation of the roundoff error in digital Alters. Related work did exist 
since the late sixties using theoretical paper-and-pencil proofs and simulation 
techniques. We believe this is the first time a complete formal framework is con- 
sidered using mechanical proofs in HOL for the error analysis of digital Alters. 
As a future work, we plan to extend these lemmas to analyse the worst-case, 
average, and variance errors. We also plan to extend the verification to the lower 
levels of abstraction, and prove that the implementation of a digital Alter at 
the register transfer and netlist gate levels implies the corresponding flxed-point 
speciflcation using classical hierarchical veriflcation in HOL, hence bridging the 
gap between the hardware implementation and high levels of the mathematical 
speciflcation. Finally, we plan to link HOL with computer algebra systems to 
create a sound, reliable, and powerful system for the veriflcation of DSP sys- 
tems. This opens new avenues in using formal methods for the veriflcation of 
DSP systems as a complement to the traditional theoretical (analytical) and 
simulation techniques. We are currently investigating the veriflcation of other 
DSP algorithms such as the fast Fourier transform (FFT) which is widely used 
as a building block in the design of complex wired and wireless communication 
systems. 
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A Digital Filter Error Analysis Lemmas in HOL 



Lemma 7 : L_ORDER_FILTER_DIRECT_FQRM_REAL_TQ_FLOAT_THM 
h L_Order_Filter_Direct_Form_Ideal_Spec a b x w M L A 
L_Drder_Filter_Direct_Form_Float_Imp X a^ y M L 

3 t f. 

if (L = 0) then 

(Real_To_Float_Error n = sum (0,SUC M) (A i. Val (b^ i) * 

(t i - 1) Val (x' (n - i)))) 

else 

((Real_To_Float_Error n + sum (l,L) (A i. a i * 
Real_To_Float_Error (n — i)) = sum (0,SUC M) (A i. Val (b^ i) * 
(t i — 1) * Val (x' (n — i))) — sum (l,L) (A i. Val (a' i) * 

(f i — 1) * Val (y (n — i))))) A 
3 k d p e z . 

(abs k < (1 / 2 pow ((fracwidth X) + 1))) A 

(Vi. (i < M) (abs (d i) < (1 / 2 pow ((fracwidth X) + 1)))) A 

(Vi. (i < M) (abs (p i) < (1 / 2 pow ((fracwidth X) + 1)))) A 

(Vi. (i < L) (abs (e i) < (1 / 2 pow ((fracwidth X) + 1)))) A 

(Vi. (i < L) => (abs (z i) < (1 / 2 pow ((fracwidth X) + 1)))) A 

(t 0 = (1 + k) * (1 + d 0) * (mul (1,M) (A i. (1 + p i)))) A 

(Vi. (1 < j A i < M) => (t j = (1 + k) * (1 + d j) * 

(mul (j,(M - (j - 1))) (A j. (1 + p j))))) A 

(f 1 = (1 + k) * (1 + e 1) * (mul (2,(L — 1)) (A i. (1 + z i)))) A 

(V j . (2 < j A j < L) 

(f j = (1 + k) * (1 + e j) * (™ul (j,(L - j + D) (A j. (1 + z j))))) 



Lemma 8 : L_ORDER_FILTER_DIRECT_FGRM_REAL_TQ_FXP_THM 
h L_Order_Filter_Direct_Form_Ideal_Spec a b x w M L A 
L_Order_Filter_Direct_Form_Fxp_Imp X^ a” b” x” v M L 
3 k d p e z . 

abs k^ < (1 / 2 pow (fracbits XO ) A 

(Vi. (i < M) abs (d^ i) < (1 / 2 pow (fracbits XO)) A 

(Vi. (i < M) => abs (p^ i) < (1 / 2 pow (fracbits XO)) A 

(Vi. (i < L) abs (e^ i) < (1 / 2 pow (fracbits XO)) A 

(Vi. (i < L) abs {t! i) < (1 / 2 pow (fracbits XO)) A 

if (L = 0) then 

(Real_To_Fxp_Error n = sum (0,SUC M) (A i. d^ i) + 
sum (l,M) (A j . p^ j) + kO 

else 

(Real_To_Fxp_Error n + sum (l,L) (A i. a i * Real_To_Fxp_Error 
(n — i)) = sum (0,SUC M) (A i. d^ i) + sum (l,M) (A j . p^ j) + 
sum (l,L) (A i. e! i) + sum (2,(L — 1)) (A j . j) + kO ) 
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Lemma 9: L_ORDER_FILTER_DIRECT_FGRM_FLOAT_TO_FXP_THM 
h L_Drder_Filter_Direct_Form_Ideal_Spec a b x w M L A 
L_Order_Filter_Direct_Form_Float_Imp X a^ y M L A 

L_Order_Filter_Direct_Form_Fxp_Imp X^ a” b” x” v M L => 
dtfkdpez. 
if (L = 0) then 

(Float_To_Fxp_Error n = sum (0,SUC M) (A i. d^ i) + 
sum (l,M) (A j . p^ j) + k' — (sum (0,SUC M) 

(A i. Val (b^ i) * (t i — 1) * Val (x^ (n — i))))) 

else 

(Float_To_Fxp_Error n + sum (1,L) (A i . a i * Float_To_Fxp_Error 
(n — i)) = sum (0,SUC M) (A i. d^ i) + sum (l,M) (A j . p^ j) + 
sum (l,L) (A i. e^ i) + sum (2,(L — 1)) (A j . j) + k^ — 
sum (0, (sue M)) (A i. Val (b' i) * (t i — 1) * Val (x' (n — i))) 
+ sum (l,L) (A i. Val (a^ i) * (f i — 1) * Val (y (n — i))))) 
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Abstract. We present an algorithm for verifying that some specihed 
arguments of an inductively defined relation in a dependently typed A- 
calculus are uniquely determined by some other arguments. We prove it 
correct and also show how to exploit this uniqueness information in cov- 
erage checking, which allows us to verify that a definition of a function or 
relation covers all possible cases. In combination, the two algorithms sig- 
nificantly extend the power of the meta-reasoning facilities of the Twelf 
implementation of LF. 

1 Introduction 

In most logics and type theories, unique existence is not a primitive notion, but 
defined via existence and equality. For example, we might define 3\x.A{x) to 
stand for 3x.A{x) A yy.A{y) D x = y. Such definitions are usually made in both 
first-order and higher-order logic, and in both the intuitionistic and the classical 
case. Expanding unique existence assertions in this manner comes at a price: not 
only do we duplicate the formula A, but we also introduce two quantifiers and 
an explicit equality. It is therefore natural to ask if we could derive some benefit 
for theorem proving by taking unique existence as a primitive. 

In this paper we consider an instance of this problem, namely verifying and 
exploiting uniqueness in a logical framework. We show how to establish unique- 
ness of certain arguments to type families in the logical framework LF [7] as 
implemented in the Twelf system [15]. We further show how to exploit this 
uniqueness information to verify meta-theoretic properties of signatures, thereby 
checking proofs of meta-theorems presented as relations in LF. In particular, we 
can automatically verify the unique existence of specified output arguments in a 
relation with respect to some given input arguments. Our algorithm will always 
terminate, but, since the problem is in general undecidable, will sometimes fail 
to establish uniqueness even though it holds. 

Our algorithm extends prior work on coverage checking [24] and mode check- 
ing [18], which in combination with termination checking [16], can verify meta- 
theoretic proofs such as cut elimination [12], the Church- Rosser theorem [19], 
logical translations [13], or the soundness of Foundational Typed Assembly Lan- 
guage [3,4]. The specific motivation for this work came mostly from the latter, 
in which a significant portion of the development was devoted to tedious but 

* This research has been supported by NSF Grant CCR-0306313. 
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straightforward reasoning about equality. Our algorithms can automate much of 
that. 

We believe that our techniques can be adapted to other systems of construc- 
tive type theory to recognize properties of relations. In that direction, the re- 
search can be seen as an extension of the work by McBride [11] and Coquand [2], 
who present procedures for deciding whether a definition by pattern matching of 
a dependently typed function consists of cases that are exhaustive and mutually 
exclusive. Here, we permit not only inputs containing abstractions, but also re- 
lational specifications, which are pervasive and unavoidable in constructive type 
theories. Like the prior work on functions, but unlike prior work on coverage 
checking [19,24], we can also verify uniqueness and unique existence. 

The remainder of the paper is organized as follows. In Section 2 we briefly 
introduce the notation of the LF type theory used throughout. In Section 3 we 
describe our algorithm for verifying uniqueness of specified arguments to rela- 
tions, and we prove its correctness in Section 4. In Section 5 we briefly review 
coverage checking, one of the central algorithms in verifying the correctness of 
meta-theoretic proofs. In Section 6 we show how to exploit uniqueness informa- 
tion to increase the power of coverage checking. We conclude in Section 7 with 
some further remarks about related and future work. 



2 The LF Type Theory 

We use a standard formulation of the LF type theory [7]; we summarize here 
only the basic notations. We use a for type families, c for object-level constants, 
and X for (object-level) variables. We say term to refer to an expression from 
any of the three levels of kinds, types, and objects. 



Kinds 


K 


:= type 


1 nx-.A.K 




Types 


A,B : 


:= a Ml 


. . . Mn 1 Ux'.A.B 


Objects 


M,N ■. 


:= c 1 a; 


Xx-.A.M 


MN 


Signatures 


S : 


:= • 1 a;. 


a:K 1 S, c 


:A 


Contexts 


r,z\: 


:= • 1 


x:A 




Substitutions 


9, a : 


■■=-\9, 


Mix 





Contexts and substitutions may declare a variable at most once; signatures 
may declare families and constants at most once. We do not distinguish terms 
from any of the three levels that differ only in the names of their bound variables. 
Our notion of definitional equality is /Jry-conversion, and we tacitly exploit the 
property that every kind, type, and object has a unique long /Jry-normal form [1, 
8] which we call canonical. The relatively simple nature of this definitional equal- 
ity avoids some thorny issues regarding intensional and extensional equality in 
constructive type theories [10, 9] that would complicate our analysis. We omit 
type-level A-abstractions from the syntax since they do not occur in canonical 
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forms. The principal judgments we use are: 

r \~s A: type Type A is valid 

r \~s M ■. A Object M has type A 

r \~s 0 : A Substitution 9 matches context A 

Since all judgments are standard we only show the last one for typing substi- 
tutions which is perhaps less widely known. We write M[6] and A[9] for the 
application of a substitution. 



rV^e : A T h M : A[9] 

Th-:- r\~ (9,M/x) : (A,x:A) 

So a substitution F \- 6 : A maps a term defined over a context Z\ to a term 
over a context F. We write 9i o 02 for composition of substitutions, so that 
-^[^i][ 6 * 2 ] = M[9i o 6 * 2 ], and write idr for the identity substitution F h idp : F. 

As a running example we use natural numbers defined in terms of zero (z) 
and successor (s), together with relations for inequality (le) and addition (plus).^ 
The corresponding signature is given in Figure 1. Note that free variables in a 
declaration are implicitly universally quantified in that declaration; the Twelf 
implementation will reconstruct these quantifiers and the types of the free vari- 
ables [15]. 



nat : type . 

z : nat. 

s : nat ^ nat. 

le : nat nat type. 

Ie_refl : le X X. 

Ie_s : le X Y -> le X (s Y) . 
plus : nat — > nat nat type. 
plus_z : plus z X X. 

plus_s : plus Xi X2 Y ^ plus (s Xi) X2 (s Y) . 



Fig. 1. Natural numbers with ordering and addition 



3 Uniqueness Mode Checking 

Logical frameworks that support higher-order abstract syntax, such as LF or 
hereditary Harrop formulas, are based on a simply typed or dependently typed 

^ This running example does not illustrate the higher-order nature of our analysis, but 
unfortunately space constraints do not permit us to include larger and more realistic 
examples. However, we have executed the uniqueness checker against higher-order 
examples drawn from [3,4]. 




Verifying Uniqueness in a Logical Framework 



21 



A-calculus. Function spaces in such a calculus are purposely impoverished in 
order to support the use of meta-language functions to represent object language 
abstractions and hypothetical proofs: too many such functions would invalidate 
the judgments-as-types or judgments-as-propositions methodology. In particular, 
these frameworks prohibit function definitions by cases or by primitive recursion. 
Adding such functions appears to require modal types or an explicit stratification 
of the type theory [5, 23, 20, 21]; related approaches are still a subject of current 
research (see, for example, [25,22]). 

The traditional and practically tested approach is to represent more complex 
functions as either type families or relations, depending on whether the frame- 
work is a type theory or a logic. ^ In many cases relational representations of 
functions are sufficient, but there are also many instances where meta-reasoning 
requires us to know that relations do indeed represent (possibly partial) func- 
tions. We can encode this property by defining explicit equality relations. For 
example, if we need to know that the relation plus is actually a function of its 
first two arguments, we can define 

eq : nat nat ^ type, 
ref I : eq X X. 

We then have to prove: “If plus Xi X 2 Y and plus Xi X 2 Y' then eq Y Y' T 

There are two difficulties with this approach: the first is simply that equality 
predicates need to be threaded through many judgments, and various rather 
trivial and tedious properties need to be proved about them. The second is that 
this methodology interferes with dependent typing because the equality between 
Y and Y' in the example above cannot be exploited by type-checking, since eq 
is just a user-declared relation. 

As uses of the meta-reasoning capabilities of the logical framework become 
increasingly complex [3,4], intrinsic support for recognizing and exploiting rela- 
tions that are indeed functions is becoming more and more important. There are 
two distinct, interconnected problems to be solved. The first is to verify that par- 
ticular relations are partial functions. The second is to exploit this information 
to verify that the same or other relations are total. 

In this section we address the former: how can we automatically verify that 
particular relations are partial functions of some of their inputs. This is a stricter 
version of mode checking familiar from logic programming. There, we designate 
some arguments to a relation as inputs and others as outputs. The property 
we verify with mode checking is that if the inputs are given as ground terms, 
and proof search succeeds, then the outputs will also be ground terms [18]. 
The sharpened version requires in addition that if proof search succeeds, then 
some designated outputs are uniquely determined. We refer to this process as 
uniqueness mode checking. 

^ Even though we are working in the LF type theory, we will use the terms type family, 
relation, and predicate interchangeably, expressing the intended meaning of the type 
families under consideration. 
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In our applications we actually need to exploit a slightly stronger property: 
if the designated input arguments to a relation are given as ground terms, then 
designated output arguments must be ground and uniquely determined, indepen- 
dent of the proof search strategy. In other words, our analysis must be based on 
a non-deterministically complete proof search strategy, rather than depth-first 
logic program execution (which is incomplete). 

We use terminology from logic programming in the description of our algo- 
rithm below. 

For the sake of simplicity we restrict the relations for which we verify unique- 
ness to consist of Horn clauses, which means the relations we analyze are induc- 
tively defined. However, the domains of quantification in the clauses are still 
arbitrary LF terms, which may be dependently typed and of arbitrary order. 

In our syntax for the Horn fragment, we refer to a constant declaration that 
is to be analyzed as a clause. We group dependently occurring arguments into 
a quantifier prefix II F and the non-dependent arguments into a conjunction of 
subgoals G. We call the atomic type Q the head of a clause c : IIF. G ^ Q. We 
sometimes refer to a term with free variables that are subject to unification as a 
pattern. All constructors for type families a appearing at the head of an atomic 
goal in a program must also be part of the program and satisfy the Horn clause 
restrictions. 

Atomic Goals Q ::= a Mi . . . M„ 

Goals G ::= g I Gi A Gz I T 

Clauses D ::= c : IIF. G ^ Q 

Programs V ::= Di, ... , 

In the implementation, we do not make this restriction and instead analyze 
arbitrary LF signatures, enriched with world declarations [19]. A description 
and correctness proof of this extension is the subject of current research and 
beyond the scope of this paper. 

Mode declarations. In order to verify mode properties of relations, we specify 
each argument of a relation to be either an input (+), an output (-), a unique 
output (-1), or unmoded (*). Intuitively, the declarations are tied to a non- 
deterministic proof search semantics and express: 

If all input (+) arguments to a predicate are ground when it is invoked, 
and search succeeds, then all output arguments are ground (-). Moreover, 
in all successful proofs, corresponding unique outputs (-1) must not only 
be ground, but equal. Unmoded arguments remain unconstrained. 

Mode information for a type family a is reflected in the functions ins(a), outs(a), 
and uouts(a), returning the sets of indices for the input arguments, output ar- 
guments, and unique output arguments respectively. 

In our example, the following declarations would be correct: 

“/.mode le +X -Y. 

“/.mode plus +X1 +X2 -lY. 
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The first one expresses that if a goal of the form \e M Y for a ground term 
M succeeds, then Y must also be ground. The second one expresses that every 
successful search for a proof of plus Mi M 2 Y with ground M\ and M 2 yields 
the same term N for Y . In other words, plus represents a partial function from 
its first two arguments to its third argument. The second declaration yields 
ins(plus) = {1,2}, outs(plus) = { }, and uouts(plus) = {3}. 

Our algorithm for uniqueness mode checking verifies two properties: disjoint- 
ness of input arguments and uniqueness of output arguments. 

Disjointness of inputs. For a given relation with some uniqueness modes on 
its output arguments, we verify that no two clause heads unify on their input 
arguments. This entails that any goal with ground input arguments unifies with 
no more than one clause head. As an example, consider the relation plus from 
Figure 1 with mode plus +X1 +X2 -lY. Uniqueness mode checking verifies that 
plus z A _ and plus (s Xi) X 2 _ do not have a unifier. This is easy because z 
and s in the first argument clash. We use the algorithm in [6] which will always 
terminate, but may sometimes generate constraints that cannot be solved. In 
that case, uniqueness mode checking will fail. 

Strictness. Because we can make the assumption that input arguments are 
ground, what is most relevant to our analysis is not full unification, but higher- 
order dependently typed matching. Schiirmann [19] has shown that each variable 
in a higher-order matching problem that has at least one strict occurrence has 
a unique, ground solution. An occurrence of a variable is strict if it is applied 
to distinct bound variables and it is not in an argument to another unification 
variable (see [14] for a more formal definition). 

Strictness is central in our analysis to conclude that if matching a pattern 
against a ground term succeeds, variables with at least one strict occurrence 
in the pattern are guaranteed to be ground. In our specific situation, we actu- 
ally employ unification of two types a M\ . . . Mn = a Ni . . . Nn where certain 
subproblems (for example. Mi = Ni for i G ins(a)) are known to be matching 
problems. 

Checking uniqueness of outputs. Uniqueness of outputs is verified by an abstract 
non-deterministic logic programming interpreter with left-to-right subgoal selec- 
tion^. The domain used is the space of abstract substitutions with elements 
unknown (u), ground (g), and unique (q) for each variable x where u carries no 
information and q the most information. Note that in order for a variable to be 
unique (q) it must also be ground. Variables of unknown status (u) may become 
known as ground (g) or unique (q) during analysis in the following situations: 

— An unknown variable that occurs in a strict position in an input argument 

of the clause head becomes known to be unique. 

® Left-to-right subgoal selection is convenient for this abstract interpretation, but not 
critical for its soundness. 
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all strict variables in inputs of Q are q in <F' 

all strict variables in outputs of Q are at least g in <F' 

all strict variables in unique outputs of Q are q in 

if all variables in inputs of Q are q 

if all variables in inputs of Q are at least g 

if all variables in unique outputs of Q are q 

if all variables in outputs of Q are at least g 



Fig. 2. Judgments on abstract substitutions 



~ An unknown variable becomes known to be ground if it occurs in a strict 
position in the output of a subgoal all of whose inputs are known to be 
ground or unique. 

— An unknown or ground variable becomes known to be unique if it occurs 
in a strict position in a unique output of a subgoal all of whose inputs are 
known to be unique. 

We next describe in detail the uniqueness mode checking for the Horn clause 
fragment of LF. The checker relies on two sets of judgments on abstract substitu- 
tions, which provide reliable, though approximate, information about the actual 
substitution at any point during search for a proof of a goal. The corresponding 
non-deterministic search strategy is explained in Section 4. 

Abstract objects fj, ::= u | g | q 

Abstract substitutions W ::= • \ iF, /i/x 

The first set of judgments have the form iF h Q™ > iF' where W is an abstract 
substitution with known information, Q is an atomic predicate a Mi . . . M„, m 
indicates which arguments to a are analyzed, and 'F' is the result of the analysis 
of Q. Both iF and will be defined on the free variables in Q. Moreover, S'' will 
always contain the same or more information than W. 

The second set of judgments W h Q™ hold if Q satisfies a property specified 
by m given the information in Again, if F \- Q : type, then F will be defined 
on the variables in T. The various forms of these judgments are given in Figure 2. 

The judgments on abstract substitutions are employed by the uniqueness 
mode checker, which is itself based on two judgments: h c : IJF.G — > P for 
checking clauses in the program, and F \- G > F' for analyzing goals G, where 
W may contain more information than F and both F and \P' are defined on the 
free variables of G. 

The mode checker is defined by the inference rules of Figure 3. We view these 
rules as an algorithm for mode checking by assuming F and G to be given, and 
constructing F' such that F \- G > F' , searching for derivations of the premises 
from first to last. We write 'F(T) for the abstract context corresponding to the 
(concrete) context T, where each variable is marked as unknown (u). 



Definition 1 (Mode correct programs). Given a program V, we write V{a) 
for the set of clauses in V with a as head. We say V is mode-correct if 
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iF(r) h P+i > 
G ><I'2 
>P2 I- P~ 

P2 I- 



h c : nr.G p 



Po h Q+1 
!Fo h Q- > iFi 
iFi h Q-i > <F2 



Po'r Q>p2 



<Po h Q+ 
iFo h Q“ > !Fi 



!Fo h Q > !Fi 



>Fo k Gi > iFi 
!?'l k G2 > P2 



iFo k Gi A G 2 > !F2 iFo k T > iFo 



Fig. 3. Uniqueness mode checking 



1. For every type family a in V , if a is declared to have unique outputs, then 
for any two distinct Ci : 77 Gi Qi and C 2 : 777^2- G 2 ^ Q 2 in 7’(a)> Qi 
and Q 2 are not unifiable on their inputs. 

2. For every constant c declared in V, we have k c : FIF. G ^ P 

Part (2) of the definition requires each predicate to have a mode declaration, 
but we may default this to consider all arguments unmoded (*) if none is given. 

As an example, consider once again the plus predicate from Figure 1 with 
mode plus +X1 +X2 -lY. We have to check clauses plus_z and plus_s. We present 
the derivations in linear style, eliding arguments to predicates that are ignored 
in any particular judgments. 

u/X k (plus z A > q/A 
q/A k T > q/A 

q/A k (plus )~ 

q/A k (plus _ _ A)-i 

k plus_z : 77 A:nat.T —> plus z A A 

u/Ai, U/A2, u/y k (plus (s Ai) A2 _)+i > q/Ai, q/A2, u/Y 
q/Ai, q/A2, u/Y k (plus Ai A2 _)+i 
q/Ai, q/A2, u/Y k (plus ___)-> q/Ai, q/A2, u/Y 
q/Ai, q/A2, u/Y k (plus _ _ V)-l > q/Ai, q/A2, q/V 
q/Ai,q/A2,u/y k plus Ai A2 y > q/Ai,q/A2,q/y 

q/Ai,q/A2,q/y k (plus )" 

q/Ai,q/A2,q/y k (plus _ _ (s Y))~^ 

k plus_s : 77 Ai:nat. 77 A 2 :nat. IIY :nat. plus Ai A2 Y plus (s Ai) A2 (s Y). 



4 Correctness of Uniqueness Mode Checking 

We next define a non-deterministic operational semantics for the Horn fragment 
and show that uniqueness mode checking approximates it. The judgment has 
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the form 9 \= G > O' , where 9 and 9' are substitutions for the free variables 
in goal G. We think of 9 and goal G as given and construct a derivation and 
substitution 9' . 

The semantics is given by the system of Figure 4. In the first rule, a and 9\ 
represent substitutions that unify P and Q[9o\- The fact that these substitutions 
may not be computable, or that they may not be most general, does not concern 
us here, since uniqueness mode checking guarantees that any unifier must ground 
all variables in P that have a strict occurrence in the input arguments of P, 
provided the input arguments of Q[9q] are ground. 



nr.G^ P P[<j] = Q[9o] [0i] a\=G >62 

do \= Q > 6o ° 9i ° 02 

do 1= Gi > dl di 1= G 2 > d2 

do \= Gi A G 2 >02 00 ^ T > 00 

Fig. 4. Operational semantics 



Definition 2 (Approximation). We define when an abstract substitution ap- 
proximates a set of substitutions as follows: Given an abstract substitution P : P 
and a set 0 of substitutions T/ \- 9i \ P , we say approximates 0 (W ^ 0) if 
for every x in the domain of \P 

1. if'P{x) = g then for all 9i G 0, 9i(x) is ground, and 

2. if'P{x) = q then for some ground term M and all 9i G 0, 9i{x) = M . 

Lemma 1 (Soundness of uniqueness mode checking). Let V be a mode- 
correct program, G a goal, W \ P an abstract substitution such that L' \~ G > <F', 
and 0 a set of substitutions. IfL'^0 then Lf' ^ {p \ 9 \= G > p, 9 G 0} . 

Proof. We let D be the set of all derivations of 0 G > p for all 0 G 6>. We 
show by induction on pairs {d, d') of derivations in D, where d derives 0 |= G > p 
and d' derives 9' \= G > p' , that if A {0,0'} then 'L' -< { p, p' }. Since d, d' are 
arbitrary the lemma follows for the whole set. 

The only nontrivial case is that of an atomic goal Q where the mode checking 
derivation for Q has the form 

'f'o I” (input variables of Q must be mapped to q) 

'^o I" Q~ > 'Pi (output variables of Q are mapped to g) 

P\ b Q~^ > P 2 (unique output variables of Q are mapped to q) 



WqG Q>L'2 
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The two derivations d and d' have the form 

nr.G ^ p ev nr.G' ^ p' £V 

P[a] = Q[9om p'[a'] = Q[e'o][9[] 

(7 h G > 02 a' h G' > 0^ 

9q \= Q > 9q o 9i o 92 9 q \= Q > 9q o 9[ o 9'2 

Write 9 out for 0 q o 0^ o 02 and 9 out for 0g o 0^ ° 02- 

It is easy to see, for each input variable x of Q, that 9out{x) = 9out(x) = 
9q(x) = 9q(x), so the approximation relation is satisfied for the input variables 
of Q. 

For the output variables, there are two subcases: either there is uniqueness 
information for the type family of Q, so that only one clause head can match Q, 
or there is no uniqueness information. 

For the first subcase P = P' and G = G' . We use the mode correctness of 
the program to obtain the subgoal mode check 'P[\~ G > p 2 , where enforces 
the mode annotations for the input and output variables of P. ^ {a, a'}, so 
by induction ^ {02,02}- Then 9 out and 0 q„^ satisfy the mode annotations for 
the output variables of Q, as required. 

For the second subcase the reasoning is similar, but there are no output 
uniqueness requirements and more than one clause head can match Q. □ 

Lemma 2 (Completeness of non-deterministic search). Given A \- Q : 
type. If Q contains only ground terms in its input positions, and there is a 
substitution 9 and term M such that ■ \- M \ Q[9], then id/i \= Q > 9' and there 
is a substitution 9" such that 9 = 9' o 9" . 

Proof. The proof is standard, using induction on the structure of M , exploiting 
the non-deterministic nature of the operational semantics to guess the right 
clauses and unifying substitutions. □ 



5 Coverage 

Coverage checking is the problem of deciding whether any closed term of a 
given type is an instance of at least one of a given set of patterns. Our work 
on exploiting uniqueness information in coverage checking is motivated by its 
application to proof assistants and proof checkers, where it can be used to check 
that all possible cases in the definition of a function or relation are covered. The 
coverage problem and an approximation algorithm for coverage checking in LF 
are described in [24], extending prior work by Coquand [2] and McBride [11]. 

More precisely, a coverage problem is given by a coverage goal and a set 
of patterns. In our setting it is sufficient to consider coverage goals that are 
types with free variables A\- A : type; it is straightforward to translate general 
coverage goals to this form. 
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Definition 3 (Immediate Coverage). We say a coverage goal A \- A : type 
is immediately covered by a collection of patterns Ai\- Ai : type if there is an i 
and a substitution A\- ai \ Ai such that A\- A = Ai[ai] : type. 

Coverage requires immediate coverage of every ground instance of a goal. 

Definition 4 (Coverage). We say A \- A : type is covered by a collection of 
patterns Ai \- Ai : type if every ground instance • \~ A[t] \ type for ■ h t : A is 
immediately covered by the collection Ai\- Ai \ type. 

As an example, consider again the plus predicate from Figure 1. We have 
already shown that the output of plus, if it exists, is unique. In order to show 
that plus is a total function of its first two arguments, we need to show that 
it always terminates (which is easy — see [18, 16]), and that the inputs cover 
all cases. For the latter requirement, we transform the signature into coverage 
patterns by eliding the outputs: 

X: nat F plus z X _. 

Xi:nat, X2:nat F plus (s Xi) X2 

The coverage goal: 

Yi:nat, Y2:nat Fplus Yi Y2 _. 

In this example, the goal is covered by the two patterns since every ground 
instance of the goal plus Mi M2 _ will be an instance of one of the two patterns. 
However, the goal is not immediately covered because Yi clashes with z in the 
first pattern and s in the second. 

When a goal A \- A : type is not immediately covered by any pattern, the 
algorithm makes use of an operation called splitting, which produces a set of 
new coverage goals by partially instantiating free variables in A. Each of the 
resulting goals is covered if and only if the original goal is covered. Intuitively, 
splitting works by selecting a variable u in A, and instantiating it to all possible 
top-level structures based on its type. 

In the example, the clashes of Yi with z and s suggest splitting of Yi, which 
yields two new coverage goals 

Y : nat F plus z Y _. 

Yi:nat, Y2:nat F plus (s Yi) Y2 _. 

These are immediately covered by the first and second pattern, respectively, but 
in general many splitting operations may be necessary. 

The process of repeated splitting of variables in goals that are not yet covered 
immediately will eventually terminate according to the algorithm in [24] , namely 
when the failed attempts to immediately cover a goal no longer suggest any 
promising candidates for splitting. Unfortunately, this algorithm is by necessity 
incomplete, since coverage is in general an undecidable property. Sometimes, 
this is due to a variable x:B in a coverage goal which has no ground instances, 
in which case the goal is vacuously covered. Sometimes, however, the coverage 
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preserv : plus Xi X3 Y ^ plus X2 X3 Y' ^ le Xi X2 ^ le Y Y' ^ type. 
preserv_refl : preserv Si S2 le_refl le_refl . 

preserv_s : preserv Si S2 L L' preserv Si (plus_s S2) (le_s L) (le_s L') . 



Fig. 5. Addition preserves ordering 



checker reaches a situation where several terms must be equal in order to obtain 
immediate coverage. It is in these situations that uniqueness information can 
help, as we explain in the next section. 



6 Uniqueness in Coverage 

We begin with an example that demonstrates failure of coverage due to the 
absence of uniqueness information. 

Given type families for natural numbers, addition, and ordering, a proof that 
addition of equals preserves ordering can be encoded as the relation preserv in 
Figure 5. Note that, as before, free variables are implicitly quantified on each 
clause. Moreover, arguments to type families whose quantifiers were omitted 
earlier (as, for example, i7Al:nat in the clause le_refl : \eXX) are also omitted, 
and determined by type reconstruction as in the Twelf implementation [15]. 

In order to verify that preserv constitutes a meta-theoretic proof, we need to 
verify that for all inputs S\ : plus Xi X^ Y, S2 ■ plus X2 X^ Y' , and L : le Xi X2 
there exists an output L' : \e Y Y' which witnesses that xi + X3 < X2 + X3 if 
Xi < X2- 

The initial coverage goal has the form 

Xi:nat, X2:nat, X3:nat, Y:nat, Y':nat, 

Sirplus Xi X3 Y, S2:plus X2 X3 Y', L:le Xi X2 F preserv Si S2 L _. 

This fails, and after one step of splitting on the variable L we obtain two cases, 
the second of which is seen to be covered by the preserv _s clause after one further 
splitting step, while the first has the form 

Xi:nat, X3:nat, Y;nat, Y':nat, Si: plus Xi X3 Y, S2:plus Xi X3 Y' . 
h preserv Si S2 le_refl _. 

The clause preserv.refi does not immediately cover this case, because the types of 
the two variable Si and S2 in this clause are the same, namely plus Xi X^ Y . This 
is because the use of reflexivity for inequality in the third and fourth arguments 
of the clause requires X\ = X2 and Y = Y' . Our extended coverage checker 
will allow us to show automatically that this case is covered by exploiting the 
uniqueness information for plus. 

We first define the situations in which uniqueness information may potentially 
be helpful, depending on the outcome of a unification problem. We then show 
how to exploit the result of unification to specialize a coverage goal. 
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Definition 5 (Specializing a coverage goal). Given a mode-correct program 
V containing a type family a with unique outputs, and a coverage goal A\- A \ 
type, uniqueness specialization for a may be applicable if 

1. A contains distinct assumptions xi : a Mi . . . M„ and X 2 '■ a N\ . . . N„, and 

2. for all i G ins{a), Mi = Ni, and 

3. for some k G uouts{a), Mk Nk- 

To specialize the goal, attempt simultaneous higher-order unification of Mk with 
Nk for all k G uouts{a). If a most general pattern unifier (mgu) for this problem 
exists, write it as A' \- cr : A, and generate a new specialized goal A' h A[a] : 
type. 

There are three possible outcomes of the given higher-order unification prob- 
lem, with the algorithm in [6]: (1) it may yield an mgu, in which case the special- 
ized coverage goal is equivalent to the original one but has fewer variables, (2) 
it may fail, in which case the original goal is vacuously covered (that is, it has 
no ground instances), or (3) the algorithm may report remaining constraints, 
in which case this specialization is not applicable. Assertions (1) and (2) are 
corollaries of the next two lemmas. 

Lemma 3. If uniqueness informationfor a type family a is potentially applicable 
to a coverage goal g = A \- A : type, but no unifier exists, then there are no 
ground instances of g (and thus g is vacuously covered by any set of patterns). 

Proof. Assume we had a substitution ■ \~ 0 : A (so that A[9] is ground). Using 
the notation from Definition 5, we have Mi = Ni for all i G ins(a) and therefore 
Mi[6] = Ni[9]. By Lemma 2, we have • |= (a Mi . . . Mn)[9] > 9i and • |= 
{a Ni . . .Nn)[9] > 02. Since the empty abstract substitution approximates the 
empty substitution, we know by Lemma 1 that for all k G uouts(a), Mk[9] = 
Nk[9]. But this is impossible since for at least one k G uouts(a), Mk and Nk 
were non-unifiable. □ 

Lemma 4. Let g = A \- A : type be a coverage goal, and V a mode-correct 
program with uniqueness information for a potentially applicable to g. If an mgu 
A' G a \ A exists and leads to coverage goal A' b A[ct] : type, then every ground 
instance A[9] of A is equal to a ground instance of A[a]. 

Proof. As in the proof of the preceding lemma, assume ■ G 9 : A (so that A\9] is 
ground). Again we have Mi = Ni for all i G ins(a) and therefore Mi\9] = Ni[9]. 
By Lemma 2, we have • \= {a Mi . . . M„)[0] > 9i and ■ \= {a Ni .. . fV„)[0] > 02 
for some 0i and 02. From Lemma 1 we now know that for all k G uouts(a), 
Mk[9] = Nk[9]. But, by assumption, cr is a most general simultaneous unifier of 
Mk = Nk for all k G uouts(a). Hence 0 = ao9' for some 0' and A\9] = A[cro0'] = 
{A[a])[9']. □ 

We return to the coverage checking problem for the type family of Figure 5. 
As observed above, without uniqueness information for plus it cannot be seen 
that all cases are covered. The failed coverage goal is 
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Xi:nat, X3:nat, Y;nat, Y':nat, Si: plus Xi X3 Y, S2:plus Xi X3 Y' . 
h preserv Si S2 le_refl 

Exploiting uniqueness information for plus, we have the unification problem Y = 
Y', with mgu YjY' , yielding the new goal 

Xi:nat, X3:nat, Y:nat, Si:plus Xi X3 Y, S2:plus Xi X3 Y. 
h preserv Si S2 le_refl 

Since ^i and S 2 have the same type, the new goal is immediately covered by the 
clause preserv_refl, completing the check of the original coverage goal. 

7 Conclusion 

We have described an algorithm for verifying uniqueness of specified output ar- 
guments of a relation, given specified input arguments. We have also shown how 
to exploit this information in coverage checking, which, together with termina- 
tion checking, can guarantee the existence of output arguments when given some 
inputs. We can therefore also verify unique existence, by separately verifying ex- 
istence and uniqueness. While our algorithms can easily be seen to terminate, 
they are by necessity incomplete, since both uniqueness and coverage with re- 
spect to ground terms are undecidable in our setting of LF. 

The uniqueness mode checker of Section 3 has been fully implemented as 
described. In fact, it allows arbitrary signatures, rather than just Horn clauses 
at the top level, although our critical correctness proof for Lemma 1 has not yet 
been extended to the more general case. We expect to employ a combination of 
the ideas from [18] and [19] to extend the current proof. In practice, we have 
found the behavior of the uniqueness checker to be predictable and the error 
messages upon failure to be generally helpful. 

We are considering three further extensions to the uniqueness mode checker, 
each of which is relatively straightforward from the theoretical side. The first 
is to generalize left-to-right subgoal selection to be instead non-deterministic. 
This would allow verification of uniqueness for more signatures that were not 
intended to be executed with Twelf’s operational semantics. The second would 
be to check that proof terms (and not just output arguments) will be ground or 
ground and unique. That would enable additional goal specialization in coverage 
checking. The third is to integrate the idea of factoring [17] in which overlapping 
clauses are permitted as long as they can be seen to be (always!) disjoint on the 
result of some subgoal. 

In terms of implementation, we have not yet extended the coverage checker 
implementation in Twelf to take advantage of uniqueness information. Since 
specialization always reduces the complexity of the coverage goal when appli- 
cable, we propose an eager strategy, comparing inputs of type families having 
some unique outputs whenever possible. Since terms in the context tend to be 
rather small, we do not expect this to have any significant impact on overall 
performance. 
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Finally, we would like to redo the theory of foundational proof-carrying 
code [3, 4] taking advantage of uniqueness modes to obtain a concrete measure 
of the improvements in proof size in a large-scale example. We expect that most 
uses of explicit equality predicates and the associated proofs of functionality 
can be eliminated in favor of uniqueness mode checking and extended coverage 
checking. As a small proof of concept, we have successfully uniqueness-checked 
four type families in the theory, amounting to about 150 lines of Twelf code 
in which the use of functional arguments is pervasive. Combined with coverage 
checking, these checks might eliminate perhaps 250 lines of proof. 
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Abstract. We present a program logic for reasoning about resource consumption 
of programs written in Grail, an abstract fragment of the Java Virtual Machine 
Language. Serving as the target logic of a certifying compiler, the logic exploits 
Grail’s dual nature of combining a functional interpretation with object-oriented 
features and a cost model for the JVM. We present the resource-aware operational 
semantics of Grail, the program logic, and prove soundness and completeness. All 
of the work described has been formalised in the theorem prover Isabelle/HOL, 
which provides us with an implementation of the logic as well as conhdence in 
the results. We conclude with examples of using the logic for proving resource 
bounds on code resulting from compiling high-level functional programs. 



1 Introduction 

For the effective use of mobile code, resource consumption is of great concern. A user 
who downloads an application program onto his mobile phone wants to know that the 
memory requirement of executing the program does not exceed the memory space avail- 
able on the phone. Likewise, concerns occur in Grid computing where service providers 
want to know that user programs adhere to negotiated resource policies and users want 
to be sure that their program will not be terminated abruptly by the scheduler due to 
violations of some resource constraints. 

The Mobile Resource Guarantees (MRG) project [27] is developing Proof-Carrying 
Code (PCC) technology [23] to endow mobile code with certificates of bounded re- 
source consumption. Certificates in the PCC sense contain proof-theoretic evidence. A 
service provider can check a certificate to see that a given resource policy will be ad- 
hered to before admitting the code to run. The feasibility of the PCC approach relies 
on the observation that, while it may be difficult to produce a formal proof of a cer- 
tain program property, it should be easy to check such a proof. Furthermore, resource 
properties are in many cases easier to verify than general correctness properties. 

Following the PCC paradigm the code producer uses a combination of program 
annotations and analysis to construct a machine proof that a resource policy is met. 
The proof is expressed in a specialized program logic for the language in which the 

K. Slind et al. (Eds.): TPHOLs 2004, LNCS 3223, pp. 34M9, 2004. 

© Springer- Verlag Berlin Heidelberg 2004 




A Program Logic for Resource Verification 



35 



code is transmitted. In the MRG project, this target language is Grail [4], an abstract 
representation of (a subset of) the Java Virtual Machine Language (JVML). Certificate 
generation is performed by a certifying compiler, e.g. [7], which transforms programs 
written in MRG’s high-level functional language Camelot into Grail [17]. Certificates 
are based on Camelot-level type systems for reasoning about resource consumption of 
functional programs [3, 1 1, 12]. For example, the Camelot program 

let rev 1 acc = match 1 with Nil@d -> acc 

I Cons(h,t)@d -> rev t (Cons (h, acc) @d) 

for reversing a list does not consume heap space. In the match statement, the annota- 
tion @ names the heap cell inhabited by the value, so that it can be reused when con- 
structing new list nodes in the body. Restrictions on the usage of such annotations are 
subject of the type system [3, 1 1] and we have an automatic inference of such annota- 
tions for Camelot [12]. Indeed, we will prove later that the Grail code emitted for rev 
by our compiler does not allocate memory. 

Contributions: We introduce a resource-aware program logic for Grail in which the 
certificates are expressed (Sections 2 and 3). The presentation of the logic follows the 
approach of the Vienna Development Method (VDM), a variation of Hoare-style pro- 
gram logic where assertions may refer to initial as well as to final states [14]. In our 
case, pre- and post-conditions are combined into single assertions ranging over pre-and 
post-heap, the environment in which the Grail expression is evaluated, the result value, 
and a component for the consumption of temporal and spatial resources. We discuss 
the meta-theoretic properties of soundness and (relative) completeness of the logic with 
respect to the functional operational semantics of Grail, based on a full formalisation 
in the theorem prover Isabelle/HOL. Since the program logic and its implementation 
are part of the trusted code base of the PCC infrastructure, it is essential for the over- 
all security of the system to have such results available. Our formalisation builds upon 
previous work on embedding program logics in theorem provers, in particular that of 
Kleymann [15] and Nipkow [24] (see Section 5 for details). In contrast to that, our 
logic features a semantics that combines object-oriented aspects with a functional-style 
big-step evaluation relation, and includes a treatment of resource consumption that is 
related to a cost model for the execution of Grail on a virtual machine platform. The 
logic is tailored so that it can be proven sound and complete while at the same time it 
can be refined to be used for PCC-oriented program verification. This has influenced the 
departure from the more traditional Hoare format, where the need of auxiliary variables 
to propagate intermediate results from pre- to post-assertions is a serious issue w.r.t. au- 
tomation. As a main technical result, we give a novel treatment of rules for mutually 
recursive procedures and adaptation that do not need separate judgements or a very 
complex variation of the consequence rule, but are elegantly proven admissible. Our 
focus on using Grail as an intermediate language, namely as the target of Camelot com- 
pilation, also motivates the decision not to provide a full treatment of object-oriented 
features such as inheritance and overriding. The expressiveness of our logic is demon- 
strated by verifying in Isabelle/HOL some resource properties of heap-manipulating 
Grail programs that were obtained by compiling Camelot programs (Section 4). 
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2 Grail 

The Grail language [4] was designed as a compromise between raw bytecode and low- 
level functional languages, and serves as the target of the Camelot compilation. While 
the object and method structure of bytecode is retained, each method body consists 
of a set of mutually tail-recursive first-order functions. The syntax comprises instruc- 
tions for object creation and manipulation, method invocation and primitive operations 
such as integer arithmetic, as well as let-bindings to combine program fragments. In 
the context of the Camelot compiler, static methods are of particular interest. Using a 
whole-program compilation approach, all datatypes are implemented by a single Grail 
class, the so-called “diamond” class [11], and functions over these datatypes result in 
distinct static methods operating on objects of this class [17]. The main characteristic 
of Grail is its dual identity: its (impure) call-by-value functional semantics is shown 
to coincide with an imperative interpretation of the expansion of Grail programs into 
the Java Virtual Machine Language, provided that some mild syntactic conditions are 
met. In particular, these require that actual arguments in function calls coincide syntac- 
tically with the formal parameters of the function definitions. This allows function calls 
to be interpreted as immediate jump instructions since register shuffling at basic block 
boundaries is performed by the calling code rather than being built into the function 
application rule. Consequently, the consumption of resources at virtual machine level 
may be expressed in a functional semantics for Grail: the expansion into JVML does 
not require register allocation or the insertion of gluing code. 

We give an operational semantics and a program logic for a functional interpretation 
of Grail, where it is assumed (though not explicitly enforced) that expressions are in 
Administrative-Normal-Form, that is all intermediate values are explicitly named. 

Syntax. The syntax of Grail expressions makes use of mutually disjoint sets of inte- 
gers, 9v[ of method names, C of class names, f of function names (i.e. labels of basic 
blocks), T of (virtual or static) field names and X of variables, ranged over by i, m, c, 
f, t, and X, respectively. We also introduce self as a reserved variable. In the following 
grammar, op denotes a primitive operation of type “V ^ “V => ‘V such as an arithmetic 
operation or a comparison operator. Here “V is the semantic category of values (ranged 
over by v), comprising integers, references r, and the special symbol ±, which stands for 
the absence of a value. Boolean values are represented as integers. Heap references are 
either null or of the form Ref / where / is a location (represented by a natural number). 
Formal parameters of method invocations may be integer or object variables. Actual 
arguments are sequences of variable names or immediate values - complex expressions 
which may occur as arguments in Camelot functions are eliminated during the compi- 
lation process. 

a G args ::= var x \ null | i 

e € expr ::= null | int i \ var x \ prim op xx \ new c [t,- := Xi\ \ x.t \ x.t\=x \ cot:=x \ 
cot I let x=e in e \ e;e \ if x then e else e \ call/ | x- m(a) \ comia) 

Expressions represent basic blocks and are built from operators, constants, and previ- 
ously computed values (names). Expressions such as x.t:=y (putfield) correspond to 
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primitive sequences of bytecode instructions that may, as a side effect, alter the heap 
or frame stack. Similarly, cot and cot:=y denote static field lookup and assignment, 
which are needed in Camelot’s memory management. The binding let x=e\ in C2 is 
used if the evaluation of ei returns an integer or reference value on top of the JVM stack 
while ei;c2 represents purely sequential composition, used for example if ei is a field 
update x.t:=y. Object creation includes the initialisation of the object fields according 
to the argument list: the content of variable x, is stored in field t,. Function calls (call) 
follow the Grail calling convention (i.e. correspond to immediate jumps) and do not 
carry arguments. The instructions x- m(a) and com{a) represent virtual (instance) and 
static method invocation. Although a formal type and class system may be imposed on 
Grail programs, our program logic abstracts from these restrictions; heap and class file 
environment are total functions on field and method names, respectively. 

We assume that all method declarations employ distinct names for identifying inner 
basic blocks. A program is represented by a table FT mapping each function identifier 
to a list of (distinct) variables (the formal parameters) and an expression, and a table 
MT associating the formal method parameters (again a list of distinct variables) and the 
initial basic block to class names and method identifiers. 



Dynamic Semantics. The machine model is based on semantic domains of heaps, 
“E of environments (maps from variables to values) and ^ of resource components. 
A heap h maps locations to objects, where an object comprises a class name and a 
mapping of field names to values. In our formalisation, we follow an approach inspired 
by Burstall, where the heap is split into several components: a total function from field 
names to locations to values, and a partial function from locations to class names. In 
addition, we also introduce a total map for modelling static (reference) fields, mapping 
class names and field names to references. 

Variables which are local to a method invocation are kept in an environment E that 
corresponds to the local store of the JVM. Environments are represented as total func- 
tions, with the silent assumption that well-defined method bodies only access variables 
which have previously been assigned a value. We use E (x) to denote the lookup oper- 
ation and E{x := v) to denote an update. Since the operational semantics uses environ- 
ments to represent the local store of method frames, no explicit frame stack is needed. 
The height of the stack is mentioned as part of the resource component. 

Resource consumption is modelled by resource tuples p where 

p = {clock callc invkc invkdpth) . 

The four components range over N and represent the following costs. The clock rep- 
resents a global abstract instruction counter. The callc and invkc components are more 
refined, i.e. they count the number of function calls (jump instructions) and method in- 
vocations. We can easily count other types of instructions, but we chose these initially 
as interesting cases: for example they may be used to formally verify Grail-level op- 
timisations such as the replacement of method (tail) recursion by function recursion. 
Finally, invkdpth models the maximal invocation depth, i.e. the maximal height of the 
frame stack throughout an execution. From this, the maximal frame stack height may 
be approximated by considering the maximal size of single frames. The size of the heap 
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is not monitored explicitly in the resource components, since it can be deduced from 
the representation of the object heap as \dom{h) \ . 

The operational semantics and the program logic make use of two operators on 
resources, p(Bq and p q. In the first three components, both operators perform point- 
wise addition, as all instruction counts behave additionally during program composi- 
tion. In the fourth component, the operator 0 again adds the respective components of 
p and q, while takes their maximum. By employing ^ in the rules for let-bindings 
we can thus model the release of frame stacks after the execution of method invocations. 

The semantics is a big-step evaluation relation based on the functional interpretation 
of Grail, with judgements of the form 

£■ h h,e jj. (h\v,p). 

Such a statement reads “in variable environment E and initial heap h, code e evaluates 
to the value v, yielding the heap h' and consuming p resources.” 

£ h A.null IJ. (/z, null, (1 0 0 0)) ^ ^ £ h /i, int z JJ. (/i, z, (1 0 0 0)) ^ ^ 



£ h /z, var x ^ (h,E{x) \ 0 0 0)) 

£ h /z.prim op xy (h,op {E{x)) (£(y)), (3 0 0 0)) 
E{x) = Reft E{x) = Ref/ 



E\-h,x.t]i,{h,h{l).t, (2000)) 



(GETF) 



£ h h,x.t:=y If (h[l.t >-* E{y)],±, (3 0 0 0)) 



£l-/z,cofD(/z,/z(c).t,(2000)) 



(GFST) 



£ h h,cot:=y {h[c.t i— > £(y)], 0, (3 0 0 0)) 



(VAR) 

(PRIM) 

(PUTF) 

(PFST) 



/ = freshloc(h) 

^ ^ (NEW) 

£ h /z.new c [t; := A;] JJ. {h[l i-> (c, {f; := £(x, ■)})], Ref /, ((«+ 1) 0 0 0)) 



£(x)=true E h,e\ (hi,v,p) 

E\- h, if X then e\ else 62 JJ- {hi,v, (2 0 0 0) ©p) 

£(x)= false E h,62 i}- {hi,v,p) 

E\- h, if X then e\ else 62 JJ- {hi,v, (2 0 0 0) ©p) 

£ h /z,ei JJ- (/zi,H>,p) £(t := w) h /zi,e 2 JJ- (/z 2 ,v,?) 

£ h h,let x = 6i in 62 JJ- (/Z 2 ,v, (1 0 0 0) © (p q)) 

£ I- /z,ei JJ- (/zi,0,p) £ I- /zi,e2 JJ- (fi2,v,g) 

E\- h,6i\62il-{h2,v,p'~- q) 

E h h,snd(ET f) JJ- (h\,v,p) 

£h/z,call/J) (/zi,v,(l 1 00)©p) 

{n6wfmm6nu\\fst{MT cm)dE) \- h,snd{MT c m) JJ- {h\,v,p) 
E V- h,com{d) JJ- (h\,v, ((2+ | a |) 0 1 1) ©p) 



(IFTRUE) 

(lEFALSE) 

(LET) 

(COMP) 

(CALL) 

(SINV) 
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classOf E hxc {newframeE{x) fst{MT cm) aE)\- h,snd(MT cm) ij. (h\,v,p) 

E h h,x-m{a) JJ. ((4+ | a |) 0 1 1) ©p) 

(VINV) 

In rule GETF, the notation h{l).t represents the value of field t in the object at heap 
location I, while in rule PUTF the notation h[l.t^ v] denotes the corresponding update 
operation. Similarly for static fields. In rule NEW, the function /rcs/zZoc(/j) returns a 
fresh location outside the domain of h, and h[l (c, {f, := E (x,)})] represents the heap 
that agrees with h on all locations different from I and maps I to an object of class c, 
with field entries t, := E{xi). In the rules CALL, SINV and VINV the lookup functions 
FT and MT are used to obtain function and method bodies from names. These are here 
implemented as static tables, though they could be used to model a class hierarchy. 
In particular MT has type C ^ {X list x expr), where the parameter passing in 

method invocations is modelled by accessing the parameter values from the caller’s en- 
vironment. Each method invocation allocates a new frame on the frame stack, where the 
function newframe creates the appropriate environment, given a reference to the invok- 
ing object, the formal parameters and the actual arguments. The environment contains 
bindings for the self object and the method parameters. If we invoke a static method we 
set the self variable to null, otherwise to the current object. 

The resource tuples in the operational semantics abstractly characterise resource 
consumption in an unspecified virtual machine; because resources are treated sepa- 
rately, these values could be changed for particular virtual machines. The temporal costs 
associated to basic instructions reflect the number of bytecode instructions to which 
the expression expands. For example, the PUTE operation involves two instructions for 
pushing the object pointer E (x) and the new content E (y) onto the operand stack, plus 
one additional instruction for performing the actual field modification. In rule NEW we 
charge a single clock tick for object creation, and n for field initialisation. The costs for 
primitive operations may be generalised to a table lookup. In the rules for condition- 
als, we charge for pushing the value E {x) onto the stack, with an additional clock tick 
for evaluating the branch condition and performing the appropriate jump. In rule CALL, 
the Grail functional call convention explains why we treat the call as a jump, continuing 
with the execution of function body. We charge for one anonymous instruction, and also 
explicitly for the execution of a jump. In rule SINV, the body of method is executed in 
an environment which represents a fresh frame. The instruction counter is incremented 
by 2 for pushing and popping the frame and | a \ for evaluating the arguments. In ad- 
dition, both the invocation counter and the invocation depth are incremented by one — 
the usage of 0 ensures that the depth correctly represents the nesting depth of frames. 
Finally, in rule VINV, the predicate classOf E h x c first retrieves the dynamic class 
name c associated to the object pointed to by x. Then, the method body associated to 
m and c is executed in a fresh environment which contains the reference to E {x) in 
variable self and the formal parameters as above. The costs charged arise again by con- 
sidering the evaluation of E (x) and the method arguments, and the pushing and popping 
of the frame, but we also charge one clock tick for the indirection needed to retrieve the 
correct method body from the class file. 
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3 Program Logic 



The program logic targets the partial correctness of resource bounds such as heap al- 
location and combines aspects of VDM-style verification [14] and Abadi-Leino’s logic 
for object calculi [1]. Sequents are of the form T\> e : P and relate a Grail expression 
e G expr to a specification P G SI in a. context T G Q (see definition below). We abbre- 
viate 0 [> e : P to \> e : P. We follow the extensional approach to the representation of 
assertions [15], where specifications are predicates (in the meta-logic) over semantic 
components and can refer to the initial and final heaps of a program expression, the 
initial environment, the resources consumed and the result value: SH ^ Jp ^ 

“V ^ (B, where ® is the set of booleans. Satisfaction of a specification P by pro- 

gram e is denoted by ^ e : P. We interpret a judgement \=e:XEhh'vp.PEhh'vp 
to mean that whenever the execution of e for initial heap h and environment E ter- 
minates and delivers final heap h' , result v and resources p, P is satisfied, that is that 
P h jl {h' ,v,p) implies PEh h' vp. 

Similar to assertions in VDM logics, our specifications relate pre- and post-states 
without auxiliary variables. For example, programs that do not allocate heap space sat- 
isfy the assertion \dom{h) \ = \dom{h')\. 



Rules: In the program logic, contexts F manage assumptions when dealing with (mutu- 
ally) recursive or externally defined methods. They consist of pairs of expressions and 
specifications: Q = expr x PL. In addition to rules for each form of program expression 
there are two logical rules, VAX and VCONSEQ. 



(e,P)GT 
Fl> e : P 



(VAX) 



r> e : P 



PEhh'vp.PEhh'vp — >QEhh'vp 
Tt>e:Q 



(VCONSEQ) 



Fl>null : XEhh' V p.h! = h Av = null A p = (1 0 0 0) 



r r> int i : XE hh' v p.h' = h A v = i A p = {I 0 0 0) 



Fl> var X : XEhh' v p.h' = h Av = E{x) A p = (1 0 0 0) 



Fl>prim op xy : XEhh' vp. v = op E{x) Ely) A P = A A p = (3 0 0 0) 



Tr> x.t :XE hh' vp.3l. E{x) = Ref / Ah' = h Av = h'{l).t A p = (2 0 0 0) 



T\> x.t\=y -.XEhh'vp. 3l.E{x) = Ref / Ap = (300 0) A 
h' = h[l.t\—> E{y)] Av = T 



(VNULL) 

(VINT) 

(WAR) 

(VPRIM) 

(VGETF) 

(VPUTE) 



Tt> cot :XEhh' V p.h' = h Av = h{c).t A p = (2 0 0 0) 



(VGETST) 



Fl> cot:=y : XEhh' vp.h' = h[c.t E{y)] A v = T A p = (3 0 0 0) 



(VPUTST) 
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Fl> new c ’.= Xi\ : XEhh' vp.3l. I = freshlocih) A /? = ((u + 1) 0 0 0) A 

h' = h[l {c,{li := E{xi)})\ A v = Ref/ 



(VNEW) 



Y\> e\ \ P\ Y [> C 2 ’■ Pi 

Fl> if X then e\ else ei '■ 'kE hh' v p.3p' . p = p' © (2 0 0 0) A 

{E{x) = true — > P\Ehh' vp') A 
(E{x) = false — > PiEhh' vp') A 
{E{x) = true \/ E{x) = false) 



(VIE) 



Y[> e\ ’.Pi Y \> ei '• Pi 

Fl> let x = e\ in 62 : kEhh' vp. 3 p\ pi h\ w. [Pi Ehh\w pi) A w ^ E /\ 

(Pl{E{x’.= w))h\E vpi) A 
p= (1 0 0 0)©(pi '-'Pi) 



(VLET) 



Y \> e\ \ P\ Y \> 62 '• Pi 

Y\> e \\62 : XEhh' vp. 3 p\ pihi . P\Ehh\ Lp\ f\ 

PlEhih'vpi f\p = pi'.^pi 

FU {(call/,P)} l> snd{ET f) : kE hh' v p. P E hh' v {I 10 0)©/? 
F> call / : P 



(VCOMP) 



(VCALL) 



FU {(cora(a),/')} l> snd[MT c m) : XE hh' vp.W E' . E = {newframenu\\fst{MT c m) a E') 

— >PE'hh'v{{2+\a\) 0 11)©/? 

Fl> comia) : P 

(VSINV) 

FU {x-m{a),P)}\> 

snd{MT c m) :X E h h' V p.y E'. {classOf E hxc f\ 

E = {newframe {E'{x) ) fst{MT c m)d E')) 
(£',/!,/!',v,((4+|fl|)0 1 l)©p)eP 
Yt>x-m{a) : P 

(VVINV) 



The axiom rule VAX allows one to use specifications found in the context. The VCON- 
SEQ consequence rule derives an assertion Q that follows from another assertion P. The 
leaf rules (VNULL to VNEW) directly model the corresponding rules in the operational 
semantics, with constants for the resource tuples. The VIE rule uses the appropriate as- 
sertion based on the boolean value in the variable x. Since the evaluation of the branch 
condition does not modify the heap we only existentially quantify over the resource tu- 
ple p'. In contrast, rule VLET existentially quantifies over the result value w, the heap hi 
resulting from evaluating ei, and the resources from ei and ei. Apart from the absence 
of environment update, rule VCOMP is similar to VLET. By relating pre and post con- 
ditions in a single assertion we avoid the complications associated to the usual VDM 
rules for sequencing [14]. However, this makes reasoning about total correctness more 
difficult. The rules for recursive functions and methods involve the context and gen- 
eralize Hoare’s original rule for parameterless recursive procedures. They require one 
to prove that the function or method body satisfies the required specification (with an 
updated resource component) under the additional assumption that the assertion holds 
for further calls or invocations. 
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Admissible Rules: A context weakening rule is easily seen to be admissible. We can 
also prove the following cut rule by induction on derivations of {{e' A\> e : Q, 



{{e’,P)}UA\>e:Q T\>e’:P FCA 

A\>e:Q 



(CUT) 



One of the contributions of this paper lies in an innovative approach to mutually 
recursive procedures and adaptation. In fact, rules VCALL, VSINV and VVINV already 
cover the case of mutual recursion. So we do not need a separate derivation system for 
judgements with sets of assertions and related set introduction and elimination rules, 
as for example Nipkow does [24], nor do we need to modify the consequence rule to 
take care of adaptation. The treatment is based on specification tables for functions 
and methods. A function specihcation table FST maps each function identiher to an 
assertion, a virtual method specihcation table vMST maps triples consisting of variable 
names, method names and (actual) argument lists to assertions, and a static method 
specihcation table sMST maps triples consisting of class names, method names and 
(actual) argument lists to assertions. Since the types allow us to disambiguate between 
the three tables, we use the notation ST to refer to their union. 

A context T is good with respect to the specihcation tables, notation goodgj{T), if 
all entries (e,P) G F satisfy 

(3/. e= call/ A P = FST f A Tt>snd{FTf) : Qoif)) V 

{3 c ma. e = com{a) A P = ST c md A yb.T[> snd{MT c m) : Qi{c,m,b) ) V 

{3x md. e = x-m{d) A P = ST x md A \/ y b c. T\> snd{MT c m) : Q 2 {c,m,b,y) ) 

where 

Qo(f) = XEhh'vp.{FSTf)Ehh' v{{\ 10 0)0/?) 

Q\{c,m,b) = XEhh'vp. WE' . E = {newframe r\uW fst{MT c m) b E') 

— >ST cmbE' hh' v{{{2+\b\)Q\ \)®p) 
Q 2 {c,m,b,y) = XE hh' v /?.V E' . {classOf E' hy c A 

E = {newframe {E' {y)) f St {MT cm) b E')) 

— >STymbE' hh' v [{{A+\b\) 0 1 1)©/?). 

Using the cut rule, we can prove that good contexts are subset-closed. 

Lemma 1. goodgj{T) — > goot/jj-jF— {(e,/’)}). 

By combining this lemma with another application of CUT, one can prove by induction 
on the size of F the following rule for mutually recursive function calls or method 
invocations, for the empty context, 



T finite good^jiT) (e,P) G F 
\>e:P 



(MUTREC) 



A variant of Lemma 1 also plays an important part in the proof of our adaptation 
rule. Parameter adaptation is notoriously problematic and has often been coupled with 
rules of consequence, resulting in fairly complicated rules [15, 24, 25]. Instead, building 
on the notion of good, we can prove (via cut and weakening) the following lemma, 
which allows one to change the actual parameters from b to d. 
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Lemma 2. {goodsT(X) A {com{b),ST cmb) GT) — > 

F— {(c <> m{b) , ST c m b)}t> com(a) : ST c ma 

The predicate good ensures, that for every pair method invocation/specification over 
given actual arguments, the context proves that the method body satisfies the same 
specification over any other arguments, provided the former is updated to reflect the 
new environment with the appropriate binding for the formal parameters. Since we 
want to prove specifications in the empty context, the lemma allows one to shrink the 
context. 

From that, adaptation in the empty context follows: 



r finite goodsT(r) {com{b),ST c m b) GT 
t> comia) : ST c m a 



(ADAPTS) 



We shall see this rule in action in Section 4. Both, Lemma 2 and rule ADAPTS, have 
counterparts for virtual methods. 



Soundness. We first define the validity of an assertion for a given program expression 
in a given context. In order to deal with soundness of function calls and method invoca- 
tions we additionally parameterise the operational semantics by a natural number acting 
as the height of the evaluation [10, 15, 24]. 

Definition 1. (Validity) Specification P is valid/or e, written \=n e : P, if 

(m<n A E \- h,e i)-m (h’,v,p)) — > P E hh' v p. 

We define \= e \ P as Vn. \=n e \ P. Note that the counter n restricts the 
set of pre- and post-states for which P has to be fulfilled. It is easy to show 
that this bound, occurring negatively in the validity formula, can be weakened, 
i.e. (m < n A \=„ e : P) — > \=m e '■ P- Validity is generalised to contexts as follows: 

Definition 2. (Context Validity) Context T is valid, written F, if \=n e : P holds for 
all (e,P) G F. Assertion P is valid for e in context F, denoted T \=ne: P, if F implies 
\=ne:P. 

The soundness theorem follows from a stronger result expressing the soundness 
property for contextual, relativised validity. 

Theorem 1. ( Soundness ) T t> e : P — > Vn. F \=„ e : P. 

Completeness. The program logic may be proven complete relative to the ambient 
logic (here HOL) using the notion of strongest specifications, similar to most general 
triples in Hoare-style verification. 

Definition 3. (Strongest Specification) The strongest specification of expression e is 
SSpec{e) = X E h h’ V p. E V h,e i)- {if ,v,p). 
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It is not difficult to prove that strongest specifications are valid, i.e. \= e : SSpec{e), 
and further that they are stronger than any other valid specification, that is {\= e \ P A 
SSpec{e) E hh' V p) — > P E h h' v p. 

The overall proof idea of completeness follows [10, 24]: we first prove a lemma that 
allows one to relate any expression e to SSpec{e) in a context T, provided that T in turn 
relates each function or method call to its strongest specification. 

/V/. T[> call/ : 55pec(call/) A \ 

Lemma 3. y c ma.ri> com{a) : SSpec{com{a)) A — > T l> e : SSpec{e) 

xma.T\>x-m{a) '.SSpec{x-m{fi)) J 

The proof of this lemma proceeds by induction on the structure of e. Next, we define a 
specific context, T, containing exactly the strongest specifications for all function calls 
and method invocations. 



f= {(e,P) \ P = SSpec{e) A 



f (3/. e = call/) V {3 c m a. e = com{a)) V 
y{3 X ma. e = x-m{a)) 



}■ 



We also define specification tables that associate the strongest assertions to all calls and 
invocations: 



ST = (//. S5/ec(call/)) U (kc m a. SSpec{com{a))) U(kx m a. SSpec{x ■ m{a))) . 



Next, we show that T is good with respect to these tables: 

Lemma 4. good^(T). 

On the other hand, combining a variant of rule CUT and MUTREC with Lemma 3 yields 

Lemma 5. If good^j{T) and T finite, then \> e : SSpec{e) holds for all e, 

for arbitrary specification tables ST. Finally, combining Lemmas 4 and 5 and rule 
VCONSEQ yields 

Theorem 2. ( Completeness) IfT finite and \= e \ P then [> e \ P. 

The finiteness condition merely represents a constraint on the syntactic categories of 
function and method names. It is fulfilled for any concrete program. 



4 Examples 

In this section we give examples of proving resource properties of Grail programs work- 
ing on integer lists. We first discuss how lists are modelled in our formalisation and then 
consider in-place list reversal and doubling elements in a list as example programs. The 
Grail code in this section corresponds to the Isabelle-output of the Camelot compiler. 

During the compilation, heap-allocated data structures arise from algebraic data- 
types in Camelot. Values of the type ills t = Nil I Cons of int * ilist are rep- 
resented as a linked list of objects of the diamond class. Each node contains fields HD, 
TL and TAG, where TAG indicates the constructor (Nil or Cons) used to create the cell. 
Since our verification targets the consumption of resources rather than full correctness 
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we use a representation predicate that ensures that a portion of the heap represents a list 
strncture without considering the data contained in individual cells. The predicate takes 
the form h, I \=x n, to be read as “starting at location I the sub-heap of h given by the set 
X of locations contains a list of length n”. It is defined indnctively, using the additional 
notation classh{l) to refer to the class of the object located at I in heap h. 



(classhU) = ILIST A h{l).TAG = 0) — > 
/ classh(l) = ILIST A h{l).TAG = 1 A \ 
h(l).TL = Ref r A I ^ X A h,r \=x n J 



h,l |={/} 0 

— > h,l hxu{/} ”+ 1 



Similar predicates have been nsed by Reynolds in separation logic [26]. Notice that in 
the second case the reference / has to be distinct from all previously used locations X. 



In-place reversal: Returning to our motivating example from the introduction, the fol- 
lowing Grail code is produced for the method rev in class I LIST with formal parameters 
[l,acc\\ 

let tag = l.T AG in let b — prlm iszero tag tag in 
if b then var acc 

else let /z = Z. HD in lett = /.TLin let one= int 1 in 
/.TAG:=o«e;/.HD:=h;/.TL:=acc;ILIST o rev{[t,l]) 

We constrain the specification tables to contain the entry 

ST ILIST rev zE h h' v p = 

V y hv f z= [Ref a, Ref b] A h,a\=xn A li,b\=Y tn A XnY = @)\ 

” ^ — > \dom(h)\ = \dom(li')\ A p = {{29n 13) 0 {n 1) {n 1)) J 

If the first method argument points initially to a list of length n, and the second argument 
points to some other (disjoint) list, any terminating execution of rev returns a heap of 
the same size as the initial heap, and the nnmber of instructions and function calls 
(jnmp instructions) depend linearly on n. The function eval implements the evalnation 
of methods arguments and is part of the newframe constrnction. We aim to prove the 
property 



\> ILISTorev([x,y]) : ST ILIST rev [x,y] (1) 

which states that an invocation of rev with (arbitrary) argnments x and y satisfies its 
specification. The generic structure of a proof of such a resource predicate first applies 
the rule ADAPTS. The required context T contains one entry for each method invocation 
that occnrs in the method body, pairing each such call with its specification: 

r= {(ILISTorev([f,/]),5T ILIST rev [t,l])}. 

As the main lemma we then prove that T is good with respect to the specification tables: 

goodsAE). 

The proof of this statement proceeds by first applying the VDM rnles VSINV and VCON- 
SEQ, and then the other syntax-directed rnles according to the program text, closing the 
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recursion by an invocation of VAX. This first phase can be seen as a classical VCG over 
the program logic rules. Two side conditions remain, requiring us to show that both 
branches satisfy the specification — the verihcation condition of the recursion case 
amounts to a loop invariant. Both side conditions can be discharged by unfolding the 
definition of /?,/ \=x n and instantiating some quantifiers. 

Where do the polynomials in the specification come from? Currently, we have left 
those values indeterminate and have them generated during a proof. In a later phase of 
the project, the Camelot compiler will generate certihcates for such resource proper- 
ties based on high-level program analysis similar to [12]’s type system for heap space 
consumption. The syntactic form of rev would allow a tail-call optimisation, where the 
recursive method invocation is transformed into a recursive function call satisfying the 
Grail calling convention. 

Doubling a list: Consider the following code for doubling the elements of a list. 

let double 1 = match 1 with 

Nil@d -> Nil@d 

I Cons(h,t)@d -> Cons (h, Cons (h, double t)@d) 

Remember the usage of @ indicates that heap cells which are freed during a match may 
be reused later — but only once [11] — so the outer application of Cons will require 
the allocation of fresh memory. Since the recursion occurs in non-tail position, it cannot 
be replaced by a simple function recursion and the resulting Grail code contains a static 
method ILIST odouble(l) with body 

let x=/.TAG in let £> = prim iszero x x in 
if b then let zero= int 0 in Z.TAG:=zero;var I 

else let x=/. HD in lett = Z.TLin lety = var/in 
iet z=\\-\ST odouble{[t\) in letone=int 1 in 
y.TAG:=o«e;y.HD:=x;y.TL:=z;let l = va.Ty in 
new ILIST [(TAG,o«e), (HD,x), (TL,/)] 

The specification has the same general structure as before, but now asserts that the heap 
grows by n many objects, that no function calls occur, and that both the number and the 
nesting depth of method invocations are linear in n. 

( {evalEz = [Ref a] A h,a\=xn) \ 

> \dom{h')\ = \dom{h)\ + n /\ j 

/? = ((35«-|- 18) 0 (n-L 1) («-fl)) ) 

We prove the following resource property for an arbitrary x: 

[> \ L\ST o double{[x\) : ST ILIST double [x] 

The proof has the same overall structure as the previous one, where the auxiliary lemma 
now reads 



goodgj{{ (ILIST odouble{[t]),ST ILIST double [t] }) . 
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5 Related Work 

Most closely related to our work on the meta-theoretical side are Nipkow’s implemen- 
tation of Hoare logic in Isabelle/HOL [24], the Java-light logic by von Oheimb [29], 
Kleymann’s thesis [15], and Hofmann’s [10] work on completeness of program logics. 
The logic by Nipkow in [24] is for a while-language with parameterless functions, with 
proofs of soundness and completeness. Several techniques we use in our treatment are 
inspired by this work, such as modelling of the heap via mini-heaps. However, we have 
made progress on the treatment of mutual recursion and adaptation. Several options for 
formalising either VDM or Hoare-style program logics have been explored by Kley- 
mann [15]. In particular this work demonstrates how to formalise an adaptation rule 
that permits to modify auxiliary variables. The techniques used in our completeness 
proof are based on those by one of the authors in [10]. 

The program logic for Java-light by von Oheimb [29] is encoded in Isabelle/HOL 
and proven sound and complete. It covers more object-oriented features, but works on 
a higher level than our logic for a bytecode language and does not cover resources. 
Moreover, it is hardly suitable for concrete program verification. 

With respect to other relevant program logics, de Boer [8] presents a sound and 
complete Hoare-style logic for an sequential object-oriented language with inheritance 
and subtyping. In contrast to our approach, the proof system employs a specific as- 
sertion language for object structures, whose WP calculus is heavily based on syn- 
tactical substitutions. Recently a tool supporting the verification of annotated programs 
(flowcharts) yielding verification conditions to be solved in HOL has been produced [5] . 
This also extends to multi-threaded Java [2]. 

Abadi and Leino combine a program logic for an object-oriented language with a 
type system [1,16]. The language supports sub-classing and recursive object types and 
attaches specifications as well as types to expressions. In contrast to our logics, it uses 
a global store model, with the possibility of storing pointers to arbitrary methods in 
objects. As a result of this design decision this logic is incomplete. An implementation 
of this logic and a verification condition generator are described in [28]. 

Several projects aim at developing program logics for subsets of Java, mainly as 
tools for program development. Muller and Poetzsch-Heffter present a sound Hoare- 
style logic for a Java subset [22]. Their language covers class and interface types with 
sub typing and inheritance, as well as dynamic and static binding, and aliasing via object 
references, see also the Jive tool [20]. As part of the LOOP project, Huisman and Ja- 
cobs [13] present an extension of a Hoare logic that includes means for reasoning about 
abrupt termination and side-effects, encoded in the PVS theorem proven Krakatoa [18] 
is a tool for verifying JML-annotated Java programs that acts as front-end to the Why 
system [9], using Coq to model the semantics and conduct the proofs. Why produces 
proof-obligations for programs in imperative-functional style via an interpretation in a 
type theory of effects and monads. Similarly, the target of the JACK environment [6] 
are verification conditions for the B system from JML annotations, though much effort 
is invested in making the system usable by Java programmers. We also mention [19], 
which embeds a Hoare logic in HOL, following previous work by Mike Gordon, to 
reason about pointer programs in a simple while-language. As an example, the authors 
provide an interactive proof in ISAR of the correctness of the Schorr- Waite algorithm. 
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Finally, [21] proves properties of the JVM in ACL2 directly from invariants and the 
operational semantics, that is without resorting to a VCG. 

6 Conclusions 

This paper has presented a resource-aware program logic for Grail, together with proofs 
of soundness and completeness. Our logic is unique in combining reasoning about re- 
sources for a general object-oriented language with completeness results for this logic. 
Grail is an abstraction over the JVM bytecode language which can be given a semi- 
functional semantics. We have developed admissible rules to work with mutually recur- 
sive methods, including parameter adaptation. While the logic already covers dynamic 
method invocation, we left a formalisation of the class hierarchy for future research. 
The logic has been encoded in the Isabelle/HOL theorem prover, and the formalisation 
of the soudness and completeness proofs provide additional confidence in the results. 
We demonstrated the usability of the logic by giving some examples, where we proved 
concrete resource bounds on space and time. These example programs have been gener- 
ated by the Camelot compiler, indicating that the logic is sufficiently expressive to serve 
as the target logic in our proof-carrying-code infrastructure. In order to mechanise the 
verification of concrete programs, we are currently defining more specialised logics for 
various resources. These logics are defined in ferms of the logic presented in this paper 
and thus inherit crucial properties such as soundness. 
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Abstract. Proof assistants based on type theory snch as COQ or Lego, 
put emphasis on indnctive specifications and proofs by featuring expres- 
sive indnctive types. It is frequent to modify an existing specification 
or proof by adding or modifying constructors in inductive types. In the 
context of language design, adding constructors is a common practice 
to make progress step-by-step. In this article, we propose a mechanism 
to extend and parameter inductive types, and a proof reuse mechanism 
founded on proof terms reuse. 



1 Motivations 

Using a theorem prover to specify and prove properties increases the confidence 
we can have in proofs. It often happens that we write specifications and prove 
theorems, then modify specifications and check if theorems are still valid. Since 
proving is time and effort consuming, reusing existing specifications and proofs 
can be very useful. However, this problem is difficult and a lot of works concern 
proof reuse. 

Proof assistants based on type theory such as COQ or Lego, put emphasis 
on inductive specifications and proofs by featuring expressive inductive types. 
Extending an inductive type by adding cases in the specifications is a frequent 
practice. But, in this case all the proofs must be replayed and updated. Usually 
one will reuse the previous proofs with the cut and paste facilities of his/her fa- 
vorite editor. Adding constructors step-by-step allows to reuse or adapt existing 
specifications and proofs, to save time. In the context of language semantics, this 
kind of approach is frequently adopted: as in [PieOO], we are often interested in 
verifying if a semantic property remains valid even if new constructions are im- 
ported. A famous example is a property related to type systems and evaluation, 
that is the well-known subject reduction theorem (SRT): the execution of a well- 
typed program, will never meet a type error. In pen and paper proofs of SRT, 
we can admit that a proof is similar to the previous one for the common cases. 
We often read: “the proof is similar to Damas Milner’s” . Many embeddings have 
been performed in theorem provers [NOPOO,DubOO,Sym99,Van97]. Nevertheless 
in machine assisted proofs, no tool exists and the reuse is simply done by cut 
and paste of the proof script, checking where modifications are necessary. 

To formalize and prove a property, we first need to give specifications with 
abstract types, inductive types, functions and predicates. Then, before proving 
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a complex theorem we can prove preliminary but simple lemmas, or use a proof 
library. This collection of definitions and proofs is called a formal development. 
Our mechanism relies on the dependencies introduced between the components 
of a formal development. Indeed, when we add constructors in an inductive type 
(such as the abstract syntax of expressions, or the type system), all lemmas 
proved by induction or case analysis need to be proved again. The new proofs 
contain the old unchanged cases and new cases, corresponding to the supple- 
mentary constructors. So, the script of the previous version of the lemma can be 
reused, it only needs to be completed for the new cases. 

The COQ system [Bar03] is a theorem prover based on the Calculus of Con- 
structions, a typed A-calculus. C. Paulin [PM96] extended this theory to the 
Calculus of Inductive Construction (CIC), where inductive types are native ob- 
jects in the system. An inductive definition is specified by its name, its type, and 
the names and the types of the constructors. 

The reuse we address here is restricted to reuse proofs after extension of 
inductive types or after adding parameters. When we add one or more con- 
structors in an inductive definition, other constructors are unchanged. However, 
when a parameter is added, the existing constructors need to be updated by a 
dummy parameter. The modifications in inductive types we describe, are on top 
of the CIC, our purpose is not to provide a new calculus incorporating extension 
of inductive types. We adopt a simple and pragmatic approach: we propose a 
syntactic construction to extend and parameter an inductive type, and a proof 
reuse mechanism illustrated on top of COQ. Reusing proof can be realized in 
different ways: reusing scripts or reusing proof terms. We discuss briefly the first 
approach. The paper focuses on the second approach, which is definitly more 
powerful. 

The paper is organised as follows. In section 2, we present some related work. 
In section 3, an example illustrates a scenario of extension and reuse. Section 
4, presents a first attempt that relies on proof scripts. Then, in section 5 we 
describe a calculus of dependencies and the consequences on proofs. In section 
6, we present the extension and parameterisation of inductive types, and our 
approach of formal reuse based on proof terms. In the last section, we formalize 
our approach. 

2 Related Work 

In this section we survey works that deal with adding constructors in an inductive 
type or more generally with reusing proofs. We also address the embedding 
of language semantics in proof checkers, because we focus on that application 
domain. 

Extending an inductive type I in J with new constructors establishes a sub- 
typing relation because all the elements defined with the constructors of I can 
also be defined with the constructors of J. E. Poll [Pol97] formalizes a notion 
of subtyping by adding constructors in an inductive type. However he focuses 
on programs and consequences on proofs are not explored. G. Barthe [BROO] 
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proves Constructor Subtyping is well-behaved, but there is no corresponding 
implementation. 

When a type / is a subtype of J and F \- x : I is valid, we would like F \- x ■. J 
to be valid too. A coercive subtyping consists in giving rules and a mechanism to 
allow such derivations. The tool Plastic [CalOO] implements coercive subtyping 
[Luo96]. The COQ system also incorporates a mechanism of implicit coercion. 
When defining a coercion from / to J, the user can transparently use objects 
of type / when a type J is expected. Adding constructors looks like adding 
methods in object oriented programming languages. We could expect to have an 
analogy with inheritance. However E. Poll showed in [Pol97] these two notions of 
adding are dual: adding a constructor in an inductive type produces a supertype, 
whereas adding a method produces a subtype. The comparison stops here, and 
our proof reuse will not use coercive subtyping. 

Work on proof reuse exist in the domain of type isomorphisms [Mag03,BP01]. 
But, in our context the extended types are no longer isomorphic to the initial 
ones and we cannot explore this way. 

With the tool TinkerType [LP99] developed by B. Pierce and M. Levin we 
can construct type systems by assembling a choice of typing rules among hundred 
ones. The system checks the consistency and produces a type checker. Unfortu- 
nately this framework does not provide any proof capacity. 

S. Gay [GayOl] proposes a framework to formalize tt- calculus type systems - 
calculus of mobile processes - in Isabelle/HOL. In this framework the formaliza- 
tion of type soundness proof, a general theory on type environments and the use 
of a meta-language facilitate the reuse for variations on type systems. However 
this reuse is finally done by hand. 

The toolset Jakarta [BDHdSOl] allows to reason about the JavaGard plat- 
form. [BG02] proposes to generate an elimination principle from a recursive 
function, to automatize reasoning about executable specifications. The technic 
used in this work is close to our’s: we both implement a tactic that transforms 
A-terms and generates proof obligations. 

Tools exist to help in machine assisted semantics or more generally in formal 
proof, but, as far as we know, in the context of automated proofs no tool provides 
extension of inductive types with the feature of formal proof reuse. 

3 A First Example 

We describe here an example to illustrate the user’s point of view of our proof 
reuse. The example presented here is very simple by lack of place. Its purpose is to 
illustrated how reusing a formal development with some new tactics incorporated 
in COQ (v7.4). We first define the type of very simple expressions built from 
natural numbers and additions: 

Inductive expr : Set := 

C : nat ^ expr I Add : expr ^ expr ^ expr . 

The predicate (eval e n) claims the expression e is evaluated in n (plus is 
the function implementing addition between natural numbers). 
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Inductive eval : expr — > nat ^ Prop : = 
evalC : Vn:nat (eval (C n) n) 

I evalAdd : Vel,e2:expr Vnl,n2:nat 

(eval el nl) ^ (eval el n2) ^ 

(eval (Add el e2) (plus nl n2)). 

Then we prove by induction on the first hypothesis that eval is deterministic. 
Lemma eval_det : Ve:expr, Vn,m:nat (eval e n) ^ (eval e m) — > n=m. 

Now, we add variables in our language, so we need to define a new type 
for these expressions. We provide the command Extend . . as . . with . to 
generate this new type. We first introduce the abstract type var of variables. 

Parameter var : Set . 

Extend expr as expr2 with Var ; var — > expr2. 

This last command generates the following definition, where old constructors are 
renamed by fresh identifiers. 

Inductive expr2 : Set := 

CO : nat-^expr2 
I AddO : expr2-^expr2— >expr2 
I Var : var-^expr2 . 

The type expr2 is now available, together with the corresponding elimination 
principles generated by COQ. 

Before proving the determinism of the new language, we need to extend 
eval since it is used by the lemma. The evaluation requires now an environment 
represented as a function from identifier to natural numbers. So, we extend eval 
and we parameter with an environment rho. 

Definition env := (var ^ nat). 

Parameter eval as eval2 with rho : env 
and extend with evalVar: (v:var) (eval2 rho (Var v) (rho v)). 

This is equivalent to the following definition 

Inductive eval2 [rho: env] : expr2^nat^Prop : = 
evalCO : Vn:nat (eval2 rho (CO n) n) 

I evalAddO : Vel,e2:expr2, Vrl,r2:nat 

(eval2 rho el rl) ^(eval2 rho el r2) 

-^(eval2 rho (AddO el e2) (plus rl r2)) 

I evalVar : Vv:var (eval2 rho (Var v) (rho v)) 

The type eval2 is now parametered by rho, and occurences of eval2 is the 
type of constructors have to be parametered by rho too. Now, the new lemma 
can be proved by using our tactic Reuse. 

Lemma eval_det2 : Vrho : env Ve:expr2 Vn,m:nat (eval2 rho e n) ^ 
(eval2 rho e m) n=m. 

Intro rho . 

Reuse eval_det . 
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One subgoal is generated, for the variable case of the induction. We solve it with 
usual tactics. 

rho : env 
V : var 



Vm:nat (eval2 rho (Var v) m)— >(rho v)=m 
Intros m Heval; Inversion Heval; Trivial. 

4 Script Reuse 

We describe here a first and naive implementation of proof reuse based on tactic- 
style proof construction. 

In the COQ system, as in many other proof assistants Lego [LP92], HOL 
[GM93], NuPRL [CAB+86], the standard way of building a proof consists in an 
interactive goal-directed process: the proof is obtained by successive refinements 
described by tactics. A tactic is applied to the current goal to split it in one or 
more subgoals, usually simpler to prove. A tactic may be a basic tactic (such as 
Intro or Case), or a composite one. Different combinators are provided, such as 
tacticals (for instance ; or orelse) or user-defined tactics implemented in the 
language Ltac [DelOO] or decision procedures (for instance Omega). 

The more basic method to reuse proof is to reuse the script of the old proof 
to construct automatically an incomplete proof. In the previous example, if we 
replay the script of eval_det to prove eval_det2, it remains a subgoal that 
corresponds to the new case. 

From the user’s point of view, a proof is just a list of tactics. Although a 
script is linear, it hides a tree structure called a tactic tree. A tactic tree is a 
tree of sequents - a goal and a context - where each node has associated with 
it a status, close or open, and the tactic which refines the node in its children. 
When the status of a node is close, the subtrees of the node are the result of the 
application of the tactic on the node. When the status is open the list of subtrees 
is empty as no tactic has been applied yet. 

The tactic tree in COQ is internally maintained until the proof is completed. 
In our context of extending and reusing, we do not throw it away: we keep and 
save it as an annotated script, where an annotation is the path in the tactic tree, 
of the tactic application. 

We assume that a property L is conservative towards the extension of the 
type I. So, the set of the nodes in the tactic tree of the complete proof of L 
before extension of I, constitute a tactic tree of an incomplete proof of L after 
the extension of I: 

— the new tactic tree contains all the paths contained in the old tactic tree 

— for all common paths, the context, the goal and the tactic in the node of the 
old tactic tree are the same in the new one modulo renaming of extended 
lemmas and functions 
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Indeed, when a tactic related to an inductive definition I, such as Case , 
Induction, is applied to a goal, the generated subgoals are presented in the 
same order as the constructors in the definition of I. When we extend the type 
/ by adding m new constructors, these new constructors are added at the end of 
the new inductive type. This is crucial to keep the same paths, when we construct 
the new tactic tree. The only difference in the new tactic tree is that some nodes 
have supplementary children whose status is open. These new subgoals are not 
proved, and the user must complete them by giving tactics. 

This script approach allows a mechanized and sure cut and paste facility. It 
is a very simple way of reusing proofs, easy to implement and portable to any 
tactics oriented proof assistant. We have implemented it for COQ. 

However this approach suffers from some drawbacks. The first one is that the 
script of the old version of the proof must be played and annotated with paths. 
Then, the whole of the proof script is played again to build the new incomplete 
proof, consequently it may be time consuming in large developments. 

A second drawback concerns the extension of functions defined by case anal- 
ysis on an extended type. The success of the reuse lies on the hypothesis that 
the function has been extended in a conservative way. This method cannot easily 
check this requirement. 

Another drawback is that we have to preprocess the tactic tree before an- 
notating it. Indeed, a composite tactic may use the tactical that allows to 
factorize some tactics. In Induction x; Apply H the tactic Apply H is applied 
to all subgoals generated by Induction x. When the type of x is extended, it 
may happen that Apply H does not apply on the new branches. The solution is 
to transform all composite tactics into simple ones. An algorithm can be found 
in [Pon99]. 

Finally, with this approach, we can extend a type but cannot parameter it, 
or consider other modifications. To remedy these drawbacks, we propose another 
approach which operates directly on the A-terms representing proofs, called proof 
terms in the following. 



5 Dependencies 

When we want to reuse proofs after modifications of types among specifications 
in a formal development, some of the proofs in the development may not be 
concerned by the modified types. In that case, they do not need to be modified. 
This section describes a mechanism to compute dependencies between lemmas, 
functions and types. Dependencies are the underpinning of our reuse approach 
based on proof terms. 



5.1 Computing Dependencies 

The COQ system has the particularity to construct a proof term (a A-term of 
the CIC) from a complete proof script, or from a definition. COQ type-checks 
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it and stores it in its environment. A A-term may contain free identifiers repre- 
senting previously defined objects, for example functions, intermediate lemmas, 
or elimination principles, stored in the environment. 

Definition 1. If L\ and L 2 are two identifiers denoting definitions, lemmas, 
functions, or types, a dependency between L\ and L 2 exists when L\ is a free 
identifier occuring in the X-term associated with L 2 . 

As in [PBR98], we compute the dependencies of a lemma L by looking for 
free identifiers in the A-term associated to the proof of A. A dependency graph 
represents all the dependencies between the objects of a development. Such a 
graph contributes to the documentation of the proof development and is useful 
to support proof extension and maintenance. 

5.2 Transparent and Opaque Dependencies 

A dependency graph can be considered as a proof plan. Since the extensions we 
could perform are supposed to be conservative, the graph is expected to remain 
identical after extension. The graph shows the dependencies between objects 
stored in the database, and we want to detect which nodes - which proofs - are 
unchanged and which ones need to be modified after an extension. So, for all 
lemmas that are not linked in the graph to one of the extended types, we know 
their proof will not need to be completed. 

Nevertheless, if a lemma L is linked to an extended type /, it is possible that 
the proof of L does not need modifications. Indeed, if the proof of L only uses the 
type of /, and eventually other lemmas, the fact that I has now more constructors 
doesn’t matter. So, the basic notion of dependency becomes too weak. We will 
now distinguish transparent dependencies and opaque dependencies. 

Definition 2. An object L has a transparent dependency with an inductive type 
I, if L has a dependency with an induction principle of I, or if a case analysis 
on type I is performed in the X-term representing L. 

Definition 3. An object L has an opaque dependency with an inductive type I, 
if its dependency with I is not transparent. 

5.3 Consequences on Proofs 

When we extend one or more types, we compute the dependency graph of the 
formal development, and we distinguish transparent and opaque dependencies. 

For all lemmas L of this graph, if L has a transparent dependency with an 
extended inductive type, we know the proof will have to be completed, when we 
will use our reuse method. This one generates only the subgoals corresponding 
to the new cases of the extended inductive type. Then the user completes the 
proof by providing the tactics to fill the holes. For other lemmas, the proof will 
remain exactly the same, it will be completely automatic as there is no hole to 
be filled. 
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The figure 1 represents the generated graph, for the example of section 3, 
where eval_det has a transparent dependency with the extended types expr 
and eval. Dotted arrows stand for transparent dependencies, and painted gray 
nodes show objects concerned by an extension. 




Fig. 1. A dependency graph 

The proof term of a lemma L may contain free identifiers that are dependen- 
cies in the generated graph. The dependencies can be lemmas or functions and 
have to be updated before proving L. 

6 To Extend, Parameter and Reuse 

In this section, we first give the syntax of the commands to extend and parameter 
an inductive type. Then we describe the reuse approach based on proof terms. 

6.1 Extension of an Inductive Definition 

We propose the following syntax to extend an inductive type: 

Extend I as J with cl : ul I ... I cm ; um. (with m^O) 

An object of type J can be built from the constructors of I or the new ones: 
cl, . . . ,cm. This syntactic construction is compiled into a COQ inductive type. 
As COQ does not allow overloading, we have to rename the constructors of I. 
We give here the scheme of the compilation. Let I be defined by: 

Inductive I llist of parameters] : T := 
dl : tl I ... I dn : tn. 

The clause: 

Extend I as J with cl : ul I ... I cm : um. 

is compiled into 

Inductive J llist of parameters] : T := 
dl’ : tl[/ := J] I ... I dn’ : tn[/ := J] 

I cl : ul I ... I cm : um. 
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The substitution [/ := J] replaces the free occurrences of I in the type 
tl, . . . ,tn by J. The constructors di of I are renamed by fresh names di’ to 
ensure their uniqueness. Elimination principles on J are automatically generated 
by COQ. 

6.2 Parameterisation of an Inductive Type 

An inductive type I can have one or more parameters p:P. In this case, in the 
type of the constructors, all occurences of I are parametered by p. We give here 
the syntax to add supplementary parameters to a type I when we extend it. Let 
I be the type: 

Inductive I [kl :hl ; . . . ;kl :hl] : T := 
dl : tl I ... I dn : tn. 

The following command 

Parameter I as J with pl:tl; . . . ; pq:tq 
and extend with cl : ul I ... I cm : um. 

is compiled into 

Inductive J [kl :hl ; . . . ;kl :hl ; pi : tl ; . . . ;pq: tq] : T := 
dl’: tl[(/ kl..kl) := {J kl..kl pl..pq)] I ... 

I dn’ : tn|(/ kl..kl) := (J kl..kl pl..pq)] 

I cl ; ul I ... I cm : um. 

The substitutions [(/ kl..kl) := (J kl..kl pl..pq)] in the types ti, rename 
occurences of I in J and complete the list of parameters of J. Once again, elim- 
ination principles on J are generated by COQ. 

6.3 Reusing Proofs 

We present here the approach based on proof terms reuse after extension of 
inductive types in the COQ proof assistant, for conservative problems. Conser- 
vative means that when we extend types, the lemmas are still true for these new 
types. 

In section 5, we explained that the extension of an inductive type / has con- 
sequences on all dependencies of /. In particular, proofs by induction on / need 
to be completed. The important contribution of our method is to automatically 
reuse old proofs. Our approach is very simple and consists in extending A-terms, 
by adding holes to be filled later. 

Within COQ system, instead of giving tactics to build a proof, the user can 
also directly provide a proof term. This proof term can be a complete one or an 
incomplete one, using the tactic Refine implemented by JC Filliatre [Bar03]. An 
incomplete term is a term containing metavariables. This tactic takes a proof 
term as argument, with metavariables representing holes in the proof. Then COQ 
generates as many subgoals as holes in the proof term. 
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The principle of our Reuse <lemma> command is to reuse the old proof term 
of <lemma> by completing the pattern-matching of case analysis (which has to be 
exhaustive), and adding holes for the new constructors. If we have parametered 
a type /, we add dummy parameters everywhere / is applied. We also update 
the names of the extended types, the constructors, and the recursors associated 
with the extended types. Then we provide this term as a proof term with a patch 
of the tactic Refine (patched by the author). 

We also provide a command Reuse <lemma> as <lemma’ >. The system first 
creates the new lemma <lemma’ > and starts the proof by using the previous reuse 
method. The difference with Reuse <lemma> is that the user does not need to 
express the new lemma, it is automatically generated from <lemma> by applying 
substitutions of all updated types. 

The system allows to extend one or more inductive types (as shown in the 
initial example expr, eval), but also allows to extend a type in different ways. 



6.4 Extension of Functions 



Functions defined by fixpoint or case analysis use an exhaustive pattern match- 
ing. So, if this pattern matching relies on an extended type I, the function has to 
be extended. This is detected in the dependency graph because such a function 
has a transparent dependency on I. For example, imagine an inductive type I 
with two constructors cl and c2 whose respective types are I and I I, and a 
function f : I t defined by: 



Ax: I. Cases x of 



end 



cl => el 

c2 y => Cases y of cl => e2 
I c2 z => e3 

end 



Now, we extend I in J with a constructor c3 of type I I. The following 
command: 



Complete f as f ’ . 

updates in the A-term of /, the names, and complete the pattern-matching on 
X by adding metavariables, denoted in COQ ?n. 

The resulting A-term of /' : J ^ t is as follows: 

Ax:J. Cases x of cl’ => el 

I c2’ y => Cases y of cl’ => e2 

I c2’ z => e3 
I c3 u => ?1 

end 

I c3 V => ?2 

end 

To define the extended function, we refine the extended A-term, as we did for 
proofs. The COQ system generates subgoals corresponding to the metavariables. 
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To help the user, we generate from the incomplete A-term, equations giving the 
shape of each matching clause. For the previous example, the equations would 
be 

(f’ (c2’ (c3 u)))= 

(f’ (c3 v))= 

This computation requires a tree traversal in the incomplete A-term. 



7 Correctness of the Proof Term Approach 

We deal here with metavariables in A-terms to represent incomplete proofs. Then 
we formalize our modifications to reuse proof terms. 



7.1 Metavariables 

To represent incomplete proofs in A-calculus, we use metavariables for holes in 
the proof term, as in Alf [Mag94] . Substitution of usual variables cannot refine 
metavariables. We need a A-calculus with metavariables and a mechanism of 
instanciation. We use here the notations introduced by C. Munoz, and the main 
property of [Muh96]. 

Definition 4. A refinement or an instanciation of a X-term A replaces a meta- 
variable X in A, by a term t without renaming any bound variables. We denote 
it by A{X 1 -^- 1} 

To each metavariable X, we associate its type Tx, the context Fx where X 
has been defined and the implicit typing rule p^\-x-Tx 

Definition 5. A signature E is a list of metavariable declarations Fx F X : Tx 

To type a A-term A with metavariables, we need the typing context F for 
variables, but we also need the signature E for metavariables of A. This is de- 
noted by judgments Et> F \- M : A, where i> separates the signature E from the 
typing context F. 

Proposition 1. Type preservation by refinement 

IfE>Fx ^t-.Tx and E, (/A h A : Tx)t>F h A : 0, then Et>F h A{X ^ t} : (f. 

In other words, a typing judgment obtained by refinement of a valid typing 
judgment is valid. 

The COQ system provides a metavariable construction and a mechanism 
of instanciation, weaker than those of C. Munoz, but sufficient to represent 
incomplete proofs. 
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T:= s sort (Set, Prop, Type) 

identifier 

Ux : T.T dependent product 

\x : T.T abstraction 

T T application 

let X = T in T local definition 

Ind(i :T)[x : T]{T} inductive type of name i 
Constr(n,i) nth constructor of i 

Case T ^ T end case analysis of T 
Fix f : T.T fixpoint construction 

Fig. 2. Simplihed definition of terms of CCI 



7.2 Correctness of the Reuse Method 

We outline here a formalization of the modifications of A-terms, to be reused 
after an extension. Then we formalize the correctness of our reuse method. 

A definition of the A-terms of the CCI can be found in [Wer94,PM96]. We 
give in figure 2 a simplified definition, where T (or x : T) stands for a list of 
terms T (or a list of pair x : T). 

The application N M where M is the list Mi. .Mg, is equivalent to the succes- 
sive applications {{{{{{NMi)M 2 )..Mg). The construction Ind{i : T)[pi : Ti...p„ : 
Tn]{Ni...Nk} is the inductive type of name i, of type T, which has k construc- 
tors of type Ni...Nk, and where the parameters pi : T\...pn : T„ are explicitly 
given. The keyword Fix in Fix f \ T .N binds the function / in the body N . In 
Case e ^ N, the cases are examined in the same order as the constructors of 
the inductive type of e. The terms in the list N take as many arguments as the 
corresponding constructor. 

Let I be Ind{I : t){q : Tg]{T}. We extend I in J with list of fresh param- 
eters p (to avoid renaming) of type Tp, and new constructors of type K. We 
define the operation of extension |.] by: |yl] is the A-term A where application 
of I are completed by the list of dummy parameters p, where occurrences of 
the inductive type / are substituted by J, and where constructors of / are re- 
placed by corresponding constructors of <7. As in case analysis pattern-matching 
on constructors is exhaustive, we complete pattern-matching on expressions of 
type J by adding the new constructors of J, associated with metavariables as 
corresponding expressions. An identifier x (of type r) depending on / (functions, 
lemmas ...) is replaced by the identifier x' (of type |r]), supposed to be in the en- 
vironment. The constants IJnd, Ijrec, I -red are also replaced by JJnd, J-rec, 
J-rect whose lists of arguments are completed by metavariables. We assume, as 
it is in practice, inductive principles are totally applied. Their arguments are 
spit up as follows: I-ind {q + P + N + M), where q are the parameters of I, P 
is the property on which the induction principle is applied, N correspond to the 
proofs of P for the constructors of /, and M are quantifications over arguments 
of P. 

A formal definition of |.] is given in figure 3. The transformation is implicitly 
parameterized by I, J and p. The symbol -I- is used to concatenate lists. In |Al], 
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II®I 

inx : M.Nj 

X : M_^l 

I/(g+7V)l _ _ 

lIJnd {q + P + N + M)j 
lijrec {q + P + i^+ 
i[7_rec^ {q + P + N +JI)l 
IMNj 
pet X = c in 
lInd{I-.t)[Tn\]{T}j 
llnd{i ■.t)[q: T,]{T}1 
lConstr(n, 7)]| 
lConstr{n, i)| 

[Case M ^ N end] 

{Fix f : t.Mj 



s 

X if X depends on 7 and is updated in x 
X otherwise 

nx-.lMim 

\x-.lMUN\ 

J (<7 + ^+ IP^) _ _ _ 

JJnd {q + p+lPj + 1^1 + X + |M]) 

J_rec (9 + P + IP] + + J^+ im 

JjrectJs + P + + ^ + IMj) 

JA7] JiV] if M / IJnd, Ijrec, Ijrect 
let X = Jc] in Jd] 

Ind{J : P]) [^~l7^ + ^HIT] +K} 
7nd(i:pl)[g:IrJ{lT]} if i ^ 7 

Constrin, J) 

Conatrin, i) if i 7 ^ 7 

Case JM] ^ (|[iVl + X) end if M : 7 

Case JM] => pV] end otherwise 

Fix f : P14M] 



Fig. 3. Definition of J.] when 7 is extended in J 



|.] is mapped to all elements of N. We note X the list of metavariables, whose 
length is those of K, introduced for new constructors of J. When a metavariable 
is added, the signature E is implicitly enriched with the typing rule of the 
metavariable. 

Proposition 2. 

If r \- A : (j) and if I is extended in J and parametered by p of type Tp, then 
E>r,r„{V^p)\-lAj : [</>! 

where Pi contains all the updated constants that depended on I, and where E is 
the signature of the metavariables introduced by |.] 

When we want to prove |</)] , we assume we have defined all needed dependen- 
cies, with the help of the dependency graph, so that Pi contains all the updated 
constants that depended on 7. We also add in typing context the types of param- 
eters p : Tp. We can establish that E > T, Ti, (p : Tp) h |yl] : |(/)] by a structural 
induction on A and applying rules of figure 3. 

The lemma we want to prove after an extension is (p : Tp)|</)]. The proof term 
we produce is Xp : Tp.|yl], then the user refines metavariables. The following 
proposition expresses the produced term has the type of the lemma to prove. 

Proposition 3. Our reuse method is correct: 

If T h A : (j>, if I is extended in J and parametered by p of type Tp, then 
r,T, h A^^.[yll{X, ^ p} : 

where Pi contains all the updated constants that depended on I, where Xi are 
the metavariables introduced by |7l], and where U are terms whose type is those 
of Xi in the signature. 
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The context A is supposed produced before trying to extend A with |.]. 
The terms ti are constructed by the user when tactics are given to solve goals 
produced from metavariables. As F h A : we get S>r,Fi,{p: Tp) h |yl] : |(/)] 

by proposition 2. By applying the typing rule of the CCI for A-abstraction, 
we obtain: A > T, A b Ap : Tp.|yl] : (p : Tp)|0]. The system instanciates the 
metavariables Xi, typed in Af, from user’s tactics, so A A b Ap : Tp. |yl]{Afi 
ti} : (p : Tp )!</>] follows from proposition 1. □ 

If the property we try to prove with our reuse method is not conservative, the 
user will not be able to discharge the generated subgoals. For instance, suppose 
a lemma is proved with an empty inductive type. When we extend this empty 
type, the generated subgoals will be false and the user will not be able to finish 
the proof. 

8 Conclusion 

We have described a formal approach for reusing proofs after extension of in- 
ductive types. A prototype written for COQ 7.4 is available. 

To extend a specification, we only propose here to add constructors in some 
inductive types, or add parameters. Nevertheless this kind of extension is limited, 
and we may need to change a definition to incorporate a new notion in the 
specifications. A good example of such an extension concerns the incorporation 
of mutable values in a functional kernel of a programming language. As we 
have to take into account the store, the reduction relation has more parameters, 
the type environment is redefined, and some lemmas are redefined. However the 
global structure of the type soundness proof remains. In [BDOl], we have exposed 
the problem of redefinition when we want to reuse. Reusing a proof at script level 
is definitely no more possible. 

Our purpose is to allow proof reusing in semantics language properties, when 
we enrich the language with modifications more general than simply adding con- 
structors or parameters. Simple modifications can be modifying type of existing 
constructors, or the type of an inductive type, by adding premises. 

In prospect, for this kind of modifications, we imagine to reuse parts of the 
proof term, but the modifications in this one reach another level of complexity, 
and the user will have to produce more informations. 
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Abstract. The technique of reflection is a way to automate proof con- 
struction in type theoretical proof assistants. Reflection is based on the 
definition of a type of syntactic expressions that gets interpreted in the 
domain of discourse. By allowing the interpretation function to be par- 
tial or even a relation one gets a more general method known as “partial 
reflection”. In this paper we show how one can take advantage of the 
partiality of the interpretation to uniformly define a family of tactics 
for equational reasoning that will work in different algebraic structures. 
The tactics then follow the hierarchy of those algebraic structures in a 
natural way. 



1 Introduction 

1.1 Problem 

Computers have made formalization of mathematical proof practical. They help 
getting formalizations correct by verifying all the details, but they also make it 
easier to formalize mathematics by automatically generating parts of the proofs. 

One way to automate proving is the technique called reflection. With reflec- 
tion one describes the desired automation inside the logic of the theorem prover, 
by formalizing relevant meta-theory. Reflection is a common approach for proof 
automation in type theoretical systems like NuPRL and Coq, as described for 
example in [1] and [10] respectively. Another name for reflection is “the two- level 
approach” . 

In Nijmegen we formalized the Fundamental Theorems of Algebra and Cal- 
culus in Coq, and then extended these formalizations into a structured library 
of mathematics named the C-CoRN library [3,5]. For this library we defined a 
reflection tactic called rational that automatically establishes equalities of ratio- 
nal expressions in a held by bringing both to the same side of the equal sign and 
then multiplying everything out. With this tactic, equalities like 

1 _l_ 1 X + y 

X y xy 

can be automatically proved without any human help. 

The rational tactic only works for expressions in a held, but using the same 
idea one can define analogous tactics for expressions in a ring or a group. The 
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trivial way to define these is to duplicate the definition of the rational tactic 
and modify it for these simpler algebraic structures by removing references to 
division or multiplication. This was actually done to implement a ring version 
of rational. 

However this is not efficient, as it means duplication of the full code of the 
tactic. In particular the normalization function that describes the simplification 
of expressions, which is quite complicated, has to be defined multiple times. But 
looking at the normalization function for field expressions, it is clear that it 
contains the normalization function for rings. In this paper we study a way to 
integrate these tactics for different algebraic structures. 

1.2 Approach 

In the C-CoRN library algebraic structures like fields, rings and groups are orga- 
nized into an Algebraic Hierarchy. The definition of a field reuses the definition of 
a ring, and the definition of a ring reuses the definition of a group. This hierarchy 
means that the theory about these structures is maximally reusable. Lemmas 
about groups are automatically also applicable to rings and fields, and lemmas 
about rings also apply to fields. 

At the same time, a tactic for proving equalities in arbitrary fields was de- 
veloped using a partial interpretation relation, as described in [10]. In this paper 
we show how we can take advantage of this partial interpretation relation to 
reuse the same tactic for simpler structures. As it turns out, the simplification 
of expressions done in a field can be directly applied to rings and groups as 
well. This is quite surprising: the normal forms of expressions that get simplified 
in this theory will contain functions like multiplication and division, operations 
that do not make sense in a group. 

1.3 Related Work 

In the C-CoRN setoid framework, rational is the equivalent of the standard Coq 
tactic field for Leibniz equality (see [7] and [4, Chapter 8.11]). Both tactics were 
developed at about the same time. The field tactic is a generalization of the Coq 
ring tactic [4, Chapter 19], so with the field and ring tactics the duplication of 
effort that we try to eliminate is also present. Also the ring tactic applies to rings 
as well as to semirings (to be able to use it with the natural numbers), so there 
is also this kind of duplication within the ring tactic itself. 

Reflection has also been widely used in the NuPRL system as described 
originally in [1]. More recently, [12] introduces other techniques that allow code 
reuse for tactics in MetaPRL, although the ideas therein are different from ours. 
Since the library of this system also includes an algebraic hierarchy built using 
subtyping (see [15]), it seems reasonable to expect that the work we describe 
could be easily adapted to that framework. 

1.4 Contribution 

We show that it is possible to have one unified mechanism for simplification of 
expressions in different algebraic structures like fields, rings and groups. We also 
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show that it is not necessary to have different normalization functions for these 
expressions, but that it is possible to decide equalities on all levels with only 
one normalization function. Presently, both the ring and the field versions of 
the tactic are used extensively throughout C-CoRN (a total of more than 1.500 
times) . 

Another extension which we present is the addition of uninterpreted function 
symbols. With it we can now automatically prove goals of the form |ti| = \t 2 \, 
which earlier had to be manually simplified to ti = 

The whole tactic is about lOOkb of code, divided between the ML implemen- 
tation (17kb), the normalization function (14kb) and the interpretation relation 
and correctness (23kb for groups, 25kb for rings and 29kb for fields); in Section 6 
we discuss why the correctness has to be proved anew for each structure. 

We compared the speed of our tactic with that of ring and field, and also 
with a similar tactic for the HOL Light system [11]. All these tactics have a 
comparable speed: our tactic is a bit faster than ring, but slower than field. 

1.5 Outline 

In Section 2 we summarize the methods of reflection and partial reflection. In 
Section 3 we describe in detail the normalization function of the rational tac- 
tic. Section 4 is a small detour where we generalize the same method to add 
uninterpreted function symbols to the expressions that rational understands. In 
Section 5 we show how to do reflection in an algebraic hierarchy in a hierarchical 
way. Finally in Section 6 we present a possibility to have even tighter integration 
in a hierarchical reflection tactic, which unfortunately turns out to require the 
so-called K axiom [14]. 

2 Reflection and Partial Reflection 

In this section we will briefly summarize [10]. That paper describes a gener- 
alization of the technique of reflection there called partial reflection. One can 
give a general account of reflection in terms of decision procedures, but here we 
will only present the more specific method of reflection with a normalization 
function, which is used to do equational reasoning. 

In the normal, “total”, kind of reflection one defines a type E of syntactic 
expressions for the domain A that one is reasoning about, together with an 
interpretation function 

I-1p :E^A 

which assigns to a syntactic expression e an interpretation |e]p. In this, p is a 
valuation that maps the variables in the syntactic expressions to values in A. 
The type E is an inductive type, and therefore it is possible to recursively define 
a normalization function Af on the type of syntactic expressions inside the type 
theory (this is not possible for A; so the reason for introducing the type E is to 
be able to define this N). 

M :E^E 
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One then proves the correctness lemma stating that the normalization function 
conserves the interpretation. 



[elp =A IA/'(e)]p 

Then, to reason about the domain A that |— ] maps to, one first constructs a 
valuation p and syntactic expressions in E which map under |— ]p to the terms 
that one want to reason about, and then one uses the lemma to do the equational 
reasoning. 

For instance, suppose that one wants to prove a =a b. One finds e, / and 
p with |e]p = a and |/]p = b. Now if Af{e) = Af{f) then we get a = |e]p =a 
|A/’(e)]p = |A/’(/)]p =A |/1p = b. (Clearly this uses the correctness lemma 
twice, see Figure 1.) Note that the operation of finding an expression in E that 
corresponds to a given expression in A (dotted arrows) is not definable in the 
type theory, and needs to be implemented outside of it. In a system like Coq it 
will be implemented in ML or in the tactic language Ctac described in [6] and 
[4, Chapter 9]. 



e G E ^ Af(e) = M(/) ^ f&E 




Things get more interesting when the syntactic expressions in E contain 
partial operations, like division. In that case the interpretation |e]p will not 
always be defined. To address this we generalized the method of reflection to 
partial reflection. The naive way to do this is to define a predicate 

wfp-.E^ Prop 

that tells whether an expression is well-formed. Then the interpretation function 
takes another argument of type wfp{e). 

[-Ip : ne:E-wfp{e) E 

The problem with this approach is that the definition of wf needs the interpre- 
tation function |— ]. Therefore the inductive definition of wf and the recursive 
definition of |— ] need to be given simultaneously. This is called an inductive- 
recursive definition. Inductive-recursive definitions are not supported by the Coq 
system, and for a good reason: induction-recursion makes a system significantly 
stronger. In set theory it corresponds to the existence of a Mahlo cardinal [8]. 
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The solution from [10] for doing partial reflection without induction-recursion 
is to replace the interpretation function with an inductively defined interpreta- 
tion relation. 

IpCEx A 

The relation |e]p = a now becomes e ][pO. It means that the syntactic expression 
e is interpreted under the valuation p by the object a. The lemmas that one then 
proves are the following. 



e\pa ^ e\pb ^ a=Ah 

e Ip a ^ N{e) Ip a 

The first lemma states that the interpretation relation is functional, and the 
second lemma is again the correctness of the normalization function. Note that 
it is not an equivalence but just an implication. This is the only direction that 
is needed. In fact, in our application the equivalence does not hold^. 

For each syntactic expression e that one constructs for an object a, one 
also needs to find an inhabitant of the statement e ][p a. In [10] types Ep{a) 
of proof loaded syntactic expressions are introduced to make this easier. These 
types correspond to the expressions that evaluate to a. They are mapped to the 
normal syntactic expressions by a forgetful function 

1-1 : Ep{a) ^ E 

and they satisfy the property that for all e in the type Ep{a) 




In this paper we will not go further into this, although everything that we do 
also works in the presence of these proof loaded syntactic expressions. 

3 Normalization Function 

We will now describe how we defined the normalization function for our main 
example of rational expressions. Here the type E of syntactic expressions is given 
by the following grammar. 

E -.— ZIY \ E-^ E \ E ■ E \ EjE 

In this Z are the integers, and V is a countable set of variable names (in the 
Coq formalization we use a copy of the natural numbers for this) . Variables will 
be denoted by x,y,z, integers by i,j,k. The elements of this type E are just 
syntactic objects, so they are different kind of objects from the values of these 
expressions in specific fields. Note that in these expressions it is possible to divide 
by zero: 0/0 is one of the terms in this type. 

^ A simple example is e = l/(l/0), which does not relate to any a. Its normal form is 
0/1, which interprets to 0. 
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Other algebraic operations are defined as an abbreviation from operations 
that occur in the type. For instance, subtraction is defined by 



e-/^e + /-(-l) 



We now will describe how we define the normalization function M{e) that maps 
an element of if to a normal form. As an example, the normal form of — — I — 1— 
is 

Af(^— + ^ ) - x-2 + 0 

^x — y x + y a; • a; • 1 + y • y • (— 1) + 0 

This last expression is the “standard form” of the way one would normally write 
this term, which is 

2a; 




From this example it should be clear how the normalization function works: 
it multiplies everything out until there is just a quotient of two polynomials 
left. These polynomials are then in turn written in a “standard form” . The 
expressions in normal form are given by the following grammar. 



F ::= P/P 
P :■= M + P\ Z 
M ::= V - M I Z 



In this grammar F represents a fraction of two polynomials, P are the polyno- 
mials and M are the monomials. One should think of P as a “list of monomials” 
(where -I- is the “cons” and the integers take the place of the “nil” ) and of M as 
a “list of variables” (where • is the “cons” and again the integers take the place 
of the “nil”). 

On the one hand we want the normalization function to terminate, but on the 
other hand we want the set of normal forms to be as small as possible. We achieve 
this by requiring the polynomials and monomials to be sorted; furthermore, no 
two monomials in a polynomial can have exactly the same set of variables. Thus 
normal forms for polynomials will be unique. 

For this we have an ordering of the variable names. So the “list” that is a 
monomial has to be sorted according to this order on V, and the “list” that is a 
polynomial also has to be sorted, according to the corresponding lexicographic 
ordering on the monomials. If an element of P or M is sorted like this, and 
monomials with the same set of variables have been collected together, we say 
it is in normal form. 

Now to define Af we have to “program” the multiplying out of E expressions 
together with the sorting of monomials and polynomials, and collecting factors 
and terms. This is done simultaneously: instead of first multiplying out the 
expressions and then sorting them to gather common terms, we combine these 
two things. 
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We recursively define the following functions (using the Fixpoint operation 
of Coq). 



— *MZ — 


M 


X 


Z - 


M 


— 'MY — 


M 


X 


V - 


M 


— 'MM — 


M 


X 


M 


M 


\~MM ~ 


M 


X 


M 


M 


\-PM — 


P 


X 


M 


P 


\-pP — 


P 


X 


p 


-4 P 


— 'PM — 


P 


X 


M 


P 


— 'PP — 


P 


X 


p 


-4 P 


\-PF — 


P 


X 


F 


F 


— -ff — 


F 


X 


F 


F 


— Iff — 


F 


X 


F - 


F 



(Actually, these functions all have type 

Ex E 

as we do not have separate types for E, P and M. However, the idea is that 
they only will be called with arguments that are of the appropriate shape and in 
normal form. In that case the functions will return the appropriate normal form. 
In the other case they will return any term that is equal to the sum or product 
of the arguments - generally we just use the sum or product of the arguments.) 
For example, the multiplication function -mm looks like 



e -MM 



e +PM f '■= \ 



( f -MI. i 


if e = 1 G Z 


f ■= { (^2 -MM /) 


■MV 6l if e = Cl • 62 


[ef 


otherwise 


on +PM is^ 




* +MM j 


if e = z G Z, / = j G Z 


/ + * 


if e = z G Z 


6l + (C2 +PM i) 


if 6 = 6i + 62 , / = z G Z 


62 +PM (ei +MM 


/) if 6 = 6i + 62 , 6i = / 


6l + (62 +PM f) 


if 6 = 6i + 62 , 6i <lex / 


f + e 


if 6 = 6i + 62 , 6i >lex / 


e + f 


otherwise 



where the lexicographic ordering <iex is used to guarantee that the monomials 
in the result are ordered. 

Finally we used these functions to recursively “program” the normalization 
function. For instance the case where the argument is a division is defined like 



Af(e//) := N{e)/FpN{f). 

^ In the fourth case, the equality ei = / is equality as lists, meaning that they might 
differ in the integer coefficient at the end. 
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The base case (when e is a variable) looks like 

V • 1 + 0 

i ■ 

To prove that a = b, then, one builds the expression corresponding to a — b and 
checks that this normalizes to an expression of the form 0/e. (This turns out 
to be stronger than building expressions e and / interpreting to a and b and 
verifying that Af{e) = Af{f), since normal forms are in general not unique.) 



4 Uninterpreted Function Symbols 



When one starts working with the tactic defined as above, one quickly finds out 
that there are situations in which it fails because two terms which are easily seen 
to be equal generate two expressions whose difference fails to normalize to 0. A 
simple example arises is when function symbols are used; for example, trying to 
prove that 

f{a + b) = f{b+a) 

will fail because f{a + b) will be syntactically represented as a variable x and 
f{b + a) as a (different) variable y, and the difference between these expressions 
normalizes to 

X ■ l + y ■ (-1) + 0 

I ’ 

which is not zero. 

In this section we describe how the syntactic type E and the normalization 
function J\f can be extended to recognize and deal with function symbols. The 
actual implementation includes unary and binary total functions, as well as 
unary partial functions (these are binary functions whose second argument is a 
proof)^. We will discuss the case for unary total functions in detail; binary and 
partial functions are treated in an analogous way. 

Function symbols are treated much in the same way as variables; thus, we 
extend the type E of syntactic expressions with a new countable set of function 
variable names Vi, which is implemented (again) as the natural numbers. The 
index 1 stands for the arity of the function; the original set of variables is now 
denoted by Vq. Function variables will be denoted u,v. 

A ::= Z I Vo I Vi(A) I A + A I A • A I E/E 



Intuitively, the normalization function should also normalize the arguments of 
function variables. The grammar for normal forms becomes the following. 



F ::= P/P 
P::=M + P\Z 
M ::= Vo - M I Vi(F) • M | Z 

® Other possibilities, such as ternary functions or binary partial functions, were not 
considered because this work was done in the setting of the C-CoRN library, where 
these are the types of functions which are used in practice. 
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But now a problem arises: the extra condition that both polynomials and mono- 
mials correspond to sorted lists requires ordering not only variables in Vq, but 
also expressions of the form Yi{F). The simplest way to do this is by defining 
an ordering on the whole set E of expressions. 

This is achieved by ordering first the sets Vq and Vi themselves. Then, ex- 
pressions are recursively sorted by first looking at their outermost operator 

X <Ei <E e + f <E e- f <E elf <e v(e) 

and then sorting expressions with the same operator using a lexicographic or- 
dering. For example, if x <Vg y and u <Vi v, then 

X <E y <E 2 <E 34 <E x/4 <E u(x + 3) <E u(2 ■ y) <E v(x + 3 ). 

With this different ordering, the same normalization function as before can be 
used with only trivial changes. In particular, the definitions of the functions -mm 
and +PM remain unchanged. Only at the very last step does one have to add a 
rule saying that 

,.(e)) ;= 4Af(e)).l + 0 

Notice the similarity with the rule for the normal form of variables. 

The next step is to change the interpretation relation. Instead of the valuation 
p, we now need two valuations 

Po '"^0 ^ ^ 

Pi : Vi ^ (A ^ A) 

and the inductive definition of the interpretation relation is extended with the 
expected constructor for interpreting expressions of the form v{e). 

As before, one can again prove the two lemmas 

® Ipo.pi O' ^ e |po,pi b ^ a =A b 
® ][po,pi o Af(e) Ipo.pi o 

Our original equality f{a + b) = f{b + a) can now easily be solved: f{a + b) 
can be more faithfully represented by the expression v{x + y), where pi{v) = /, 
Po(x) = a and poiy) = b; the syntactic representation of /(6 -I- a) becomes 
v{y + x); and each of these normalizes to 

V (^ x-l+y-l+O ^ . 

i ’ 

so that their difference normalizes to 0 as was intended. 

Adding binary functions simply requires a new sort V 2 of binary function 
symbols and extend the type of expressions to allow for the like of u(e, /); the 
normalization function and the interpretation relation can easily be adapted, the 
latter requiring yet another valuation 

P2 ■ V 2 — > {A X A — > A). 
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Partial functions are added likewise, using a sort V| for partial function symbols 
and a valuation 



p; : V; ^ (A A). 

As was already the case with division, one can write down expressions like 
v(e) even when is not defined at the interpretation of e; the definition 

Ipo,pi.pr,P 2 ensures that only correctly applied partial functions will be inter- 
preted. 

5 Hierarchical Reflection 

The normalization procedure described in Section 3 was used to define a tactic 
which would prove algebraic equalities in an arbitrary field in the context of the 
Algebraic Hierarchy of [9]. 

In this hierarchy, fields are formalized as rings with an extra operation (di- 
vision) which satisfies some properties; rings, in turn, are themselves Abelian 
groups where a multiplication is defined also satisfying some axioms. The ques- 
tion then arises of whether it is possible to generalize this mechanism to the 
different structures of this Algebraic Hierarchy. This would mean having three 
“growing” types of syntactic expressions Eg, Er and Ep (where the indices 
stand for groups, rings and fields respectively) together with interpretation re- 
lations^ . 

r 

Ep ^ F : Field 

c 

][K 

Er ^ ^ R : Ring 



Eg ^ G : Group 

However one can do better. The algorithm in the normalization function works 
outwards; it first pushes all the divisions to the outside, and then proceeds 
to normalize the resulting polynomials. In other words, it first deals with the 
field-specific part of the expression, and then proceeds working within a ring. 
This suggests that the same normalization function could be reused to define 
a decision procedure for equality of algebraic expressions within a ring, thus 
allowing Ep and Er to be unified. 

Better yet, looking at the functions operating on the polynomials one also 
quickly realizes that these will never introduce products of variables unless they 
are already implicitly in the expression (in other words, a new product expres- 
sion can arise e.g. from distributing a sum over an existing product, but if the 

^ For simplicity we focus on the setting where function symbols are absent; the more 
general situation is analogous. 
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original expression contains no products then neither will its normal form). So 
our previous picture can be simplified to this one. 



E- 



F : Field 



• R : Ring 



G : Group 



The key idea is to use the partiality of the interpretation relation to be able 
to map E into a ring i? or a group G. In the first case, expressions of the form 
e// will not be interpreted; in the latter, neither these nor expressions of the 
form e • / relate to any element of the group. 

There is one problem, however. Suppose a; is a variable with p{x) = a; then 
a + a is represented hy x + x, but 

N{x + x) = ][p a + a 



does not hold. 

In order to make sense of the normal forms defined earlier, one needs to 
interpret the special cases e/1 in groups and rings, as well as e • / with / = z G Z 
in groups (assuming, of course, that e can be interpreted). 

The following table summarizes what each of the interpretation relations can 
interpret. 









Ip" 


u G V 


yes 


yes 


yes 


z G Z 


. 

II 

o 


yes 


yes 


e + f 


yes 


yes 


yes 


e- / 


if/GZ 


yes 


yes 


e// 


if/ = 1 


if/=l 


if/^0 



In the last three cases the additional requirement that e and / can be interpreted 
is implicit. 

Once again, one has to prove the lemmas 

e\^ a A e\^ b ^ a=Ab 
e ][p a ^ Af(e) ][® a 



and analogous for ][^ and ][^ . 

In these lemmas, one needs to use the knowledge that the auxiliary functions 
will only be applied to the “right” arguments to be able to finish the proofs. 
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This is trickier to do for groups than for rings and fields. For example, while 
correctness of -mm w.r.t. ][J is unproblematic, as it states that 

e][p a A f Ip b ^ e -mm f ][p a ■ b, 

the analogue of this statement for ][® cannot be written down, as a ■ b has 
no meaning in a group. However, by definition of ][J, this is equivalent to the 
following. 

e • / ][p a • 5 ^ e -mm f ][p a ■ b 

Now this second version does possess an analogue for ][^, by replacing the ex- 
pression a • b with a variable. 

e • / Ip c ^ e -mm / ][p c. 

This is still not provable, because -mm can swap the order of its arguments. The 
correct version is 

c • / Ip c V f ■ e][f c ^ e -mm f ][p c; 

the condition of this statement reflects the fact that the normalization function 
will only require computing e -mm f whenever either e or / is an integer. 

The implementation of the tactic for the hierarchical case now becomes 
slightly more sophisticated than the non-hierarchical one. When given a goal 
a =A b it builds the syntactic representation of a and b as before; and then looks 
at the type of A to decide whether it corresponds to a group, a ring or a field. 
Using this information the tactic can then call the lemma stating correctness of 
Af w.r.t. the appropriate interpretation relation. 

Optimization 

As was mentioned in Section 3, normal forms for polynomials are unique, con- 
trarily to what happens with field expressions in general. This suggests that, 
when A is a group or a ring, the decision procedure for a =a b can be simpli- 
fied by building expressions e and / interpreting respectively to a and b and 
comparing their normal forms. Clearly, this is at most as time-consuming as the 
previous version, since computing A/"(e — /) requires first computing Af(e) and 

m)- 

Also, since the normalization function was not defined at once, but resorting 
to the auxiliary functions earlier presented, it is possible to avoid using divisions 
altogether when working in rings and groups by defining directly Af' by e.g. 

Af'(e + /)=Af'(e)+ppAf(/)'; 

the base case now looks like 

Af'{v) = V • 1 -I- 0. 

Notice that although we now have two different normalization functions we still 
avoid duplication of the code, since they are both defined in terms of the same 
auxiliary functions and these are where the real work is done. 
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6 Tighter Integration 

In the previous section we managed to avoid having different syntactic expres- 
sions for the different kinds of algebraic structures. We unified the types of 
syntactic expressions into one type E. 

However we still have different interpretation relations ][J, ][^ and ][®. We 
will now analyze the possibility of unifying those relations into one interpretation 
relation ][^. This turns out to be possible, but when one tries to prove the relevant 
lemmas for it one runs into problems: to get the proofs finished one needs to 
assume an axiom (in the type theory of Coq). 

Every field, ring or group has an underlying carrier. We will write A for the 
carrier of an algebraic structure A. We now define an interpretation relation ][^ 
from the type of syntactic expressions E to an arbitrary set® S, where that set is 
a parameter of the inductive definition. This inductive definition quantifies over 
different kinds of algebraic structures in the clauses for the different algebraic 
operations. For instance the inductive clause for addition quantifies over groups. 

na-.Groupnej-.En^ ^^^^.Q {a +G b=Gc)^ (e ][f o) ^ (/ ][f b) ^ {e + f If c) 
With this definition the diagram becomes the following. 



E F : Field 




This gives a nice unification of the interpretation relations. However, when one 
tries to prove the relevant lemmas for it in Coq, the obvious way does not work. 
To prove e.g. 

e if a A e if b ^ a=Ab 

one needs to use inversion with respect to the inductive definition of ][p to get 
the possible ways that e 0 a can be obtained; but the inversion tactic of Coq 
then only produces an equality between dependent pairs where what one needs 
is equality between the second components of those pairs. In Coq this is not 
derivable without the so-called K axiom, which states uniqueness of equality 
proofs [13]. 

® In the formalization we actually have setoids instead of sets, but that does not make 
a difference. 
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forall (A: Set) (x:A) (p:(x=x)), p = refl_equal A x 

We did not want to assume an axiom to be able to have our tactic prove equalities 
in algebraic structures that are clearly provable without this axiom. For this 
reason we did not fully implement this more integrated version of hierarchical 
reflection. 

7 Conclusion 

7.1 Discussion 

We have shown how the rational tactic (first described in [10]), which is used 
to prove equalities of expressions in arbitrary fields, can be generalized in two 
distinct directions. 

First, we showed in Section 4 how this tactic could be extended so that 
it would also look at the arguments of functions; the same mechanism can be 
applied not only to unary total functions, as explained, but also to binary (or 
n-ary) functions, as well as to partial functions as defined in the C-CoRN library 

[3]. 

In Section 5 we discussed how the same syntactic type E and normalization 
function Af could be reused to define similar tactics that will prove equalities 
in arbitrary rings or commutative groups. The work described here has been 
successfully implemented in Coq, and is intensively used throughout the whole 
C-CoRN library. 

Further extensions of this tactic are possible; in particular, the same ap- 
proach easily yields a tactic that will work in commutative monoids (e.g. the 
natural numbers with addition). For simplicity, and since this adds nothing to 
this presentation, this situation was left out of this paper. 

Extending the same mechanism to non-commutative structures was not con- 
sidered. The normalization function intensively uses commutativity of both ad- 
dition and multiplication, so it cannot be reused for structures that do not satisfy 
these; and the purpose of this work was to reuse as much of the code needed for 
rational as possible. 

The correctness of the normalization function w.r.t. the interpretation rela- 
tion had to be proved three times, one for each type of structure. In Section 6 we 
showed one possible way of overcoming this, which unfortunately failed because 
proving correctness of the tactic would then require assuming an axiom which 
is not needed to prove the actual equalities that the tactic is meant to solve. It 
would be interesting to know whether this approach can be made to work with- 
out needing the K axiom. Though this axiom is required to prove these lemmas 
using inversion, there might be an alternative way to prove them that avoids this 
problem. 

A different approach to the same problem would be to use the constructor sub- 
typing of [2]. This would allow one to define e.g. the interpretation relation for 
rings ][^ by adding one constructor to that for groups ][^; proving the relevant 
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lemmas for the broader relation would then only require proving the new case 
in all the inductive proofs instead of duplicating the whole code. 

Another advantage of this solution, when compared to the one explored in 
Section 6, would be that the tactic could be programmed and used for e.g. 
groups before rings and fields were even defined. It would also be more easily 
extendable to other structures. Unfortunately, constructor subtyping for Coq is 
at the moment only a theoretical possibility which has not been implemented. 
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Abstract. Embedded computing systems have always been signihcantly 
more diverse than their mainstream microprocessor counterparts, but 
they have also been relatively more simple to design and validate. Given 
the current global fascination with ubiquitous information and commu- 
nication services in a highly mobile world, simplicity is rapidly disap- 
pearing. Advanced perception systems such as speech and visual feature 
and gesture recognizers, 3G and 4G cellular telephony algorithms can 
not currently be done in real time on performance microprocessors let 
alone at a power budget commensurate with mobile embedded devices. 
This talk will describe an architectural approach to embedded systems 
which outperforms performance microprocessors while consuming less 
power than current embedded systems for the above applications. This 
approach will be used as a way to highlight new issues of correctness 
in embedded systems. Namely correctness applies to functional, energy 
consumption, and real time processing constraints. Given that these is- 
sues become even more critical as technology scales makes life even more 
complex. In order to deal with these hard problems, system architects 
are creating new system models where the application, operating system, 
and hardware interact in new ways to collaboratively manage the compu- 
tational rates and energy consumption. This new system model generates 
a new set of validation problems that will become critical roadblocks to 
progress in advanced embedded systems of the future. The talk will con- 
clude with a description of validation challenge problems inspired by the 
new system model. 
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Abstract. We present an account of rippling with proof critics suitable 
for use in higher order logic in Isabelle/ISAPLANNER. We treat issues not 
previously examined, in particular regarding the existence of multiple 
annotations during rippling. This results in an efficient mechanism for 
rippling that can conjecture and prove needed lemmas automatically as 
well as present the resulting proof plans as Isar style proof scripts. 



1 Introduction 

Rippling [5] is a rewriting technique that employs a difference removal heuris- 
tic to guide the search for proof. Typically, it is used to rewrite the step case 
in a proof by induction until the inductive hypothesis can be applied. Within 
the context of proof planning [4], this technique has been used in a variety 
of domains including the automation of hardware verification [6], higher order 
program synthesis [13], and more recently to automate proofs in nonstandard 
analysis [14]. 

In this paper we describe a higher order version of rippling which has been 
implemented for the Isabelle proof assistant [15] using the IsaPlanner proof 
planner [9]. We believe this is the first time that rippling with a proof critics 
mechanism has been implemented outside the Clam family of proof planners. 
Our account bears similarity to that presented by Smaill and Green [19], but 
uses a different mechanism for annotating differences more closely related to 
rippling in first order domains. It also exposes and treats a number of issues 
not previously examined regarding situations where multiple embeddings and 
annotations are possible. This leads to an efficient implementation of rippling. 

This work is also of particular interest to Isabelle users as it provides improved 
automation and means of conjecturing and proving needed lemmas, as well as 
automatically generating Isar proofs scripts [20]. 

The structure of the paper is as follows: in the next section, we give a brief 
introduction to IsaPlanner. In Sections 3 and 4, we introduce static rippling 
and dynamic rippling. In Section 5, we describe the version of rippling imple- 
mented in IsaPlanner and then outline, in Section 6, a technique that combines 
rippling with induction, and some proof critics. We present an example applica- 
tion in the domain of ordinal arithmetic in Section 7, and some further results 
in Section 8. Finally, Sections 9 and 10 describe related work and present our 
conclusions and future work. 
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2 IsaPlanner 

IsaPlanner^ is a generic framework for proof planning in the interactive theo- 
rem prover Isabelle. It facilitates the encoding of reasoning techniques, which can 
be used to conjecture and prove theorems automatically. A salient characteristic 
of IsaPlanner is its derivation of fully formal proofs, expressed in readable Isar 
style proof scripts as part of the proof planning process. 

Proof planning in Isabelle/ISAPLANNER is split into a series of reasoning 
states which capture ‘snapshots’ of the planning process. Each reasoning state 
contains the current partial proof plan, the next reasoning technique to be ap- 
plied, and any appropriate contextual information. Reasoning techniques are 
encoded as functions from a reasoning state to a sequence of reasoning states, 
where each state in the resulting sequence represents a possible way in which 
the technique can be applied. This encoding of techniques allows the reasoning 
process to be decomposed into steps which are evaluated in a ‘lazy’ fashion. 

The contextual information captures any knowledge that might be applicable 
to the current proof process and can be modified during proof planning. Contex- 
tual information also facilitates the design and definition of reasoning techniques 
by providing a data structure to hold knowledge derived during proof planning. 
Examples of such information include a conjecture database, annotations for 
rippling, and a high level description of the proof planning process. 

Proof planning is performed by searching through the possible ways a rea- 
soning technique can be applied. It terminates when a desired reasoning state is 
found, or when the search space is exhausted. Search mechanisms such as Depth 
Eirst, Iterative Deepening, Breadth Eirst and Best Eirst have been implemented 
in IsaPlanner. Moreover, search strategies can be attached to a technique 
and used locally within its application. This allows us to take advantage of the 
heuristic measure given by rippling to choose the ‘most promising’ future state 
by using best first search, for example. 



3 An Introduction to Rippling 

While there are many variations of rippling [5], the central principle is to remove 
the differences between all or part of a goal and some defined skeleton constructed 
from the inductive hypothesis or, in some cases, from another assumption or 
theorem. Through the removal of this difference, the assumption or theorem 
that was employed to construct the skeleton can then be used to solve the goal 
in a process termed fertilisation. Thus rippling gives a direction to the rewriting 
process. 

The difference removal is facilitated by specialised annotations on the goal 
known as wave fronts, wave holes, and sinks. More specifically, wave fronts in- 
dicate difference between the skeleton and the goal while wave holes identify 
subterms inside the wave fronts that are similar- to parts of the goal. Sinks, 



^ http: / /isaplanner.sourceforge.net / 
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for their part, indicate positions in the skeleton that correspond to universally 
quantified variables and towards which wave fronts can be moved before being 
eventually discarded. Fertilisation is possible when the wave fronts have been 
removed from a subterm matching the skeleton, or placed in sinks appropriately. 
Thus, there are two directions rippling can pursue: 



rippling-out: tries to remove the differences, or move them to the top of the 
term tree, thereby allowing fertilisation in a subterm, 
rippling-in: tries to move the differences into sinks, as discussed above. 



As an example consider the skeleton V6. a + 6 = 6+a, then the term Suc{a) + 



b = Suc{b + a) can be annotated as: Suc{a) + [b\ = Suc( [b\ + a) 



The 



boxes indicate the wave fronts, and the underlined subterms are the wave holes. 
The up and down arrows indicate rippling outward and inward respectively, and 
the annotations [6J indicate that b is at the location of a sink. 

To provide rippling with a direction and to ensure its termination, a measure 
is used that decreases each time the goal is rewritten. The measure is a pair 
of lists of natural numbers that indicates the number of wave fronts (outward 
and inward) at each depth in the skeleton term. The outward list is obtained by 
counting the number of outward wave fronts from leaf to root and the inward 
list by tallying the inward ones from root to leaf. For example, the term tree for 
the annotation shown earlier is as follows: 



Suc{a) 








1 


Out 

0 


In 

0 




Suc (. ..-!-...) 


0 


1 


\P\^ 






1 


0 



which results in the measure ([1, 0, 0], [0, 1, 0]). Such measures are compared lex- 
icographical as if they were a single list starting with the outward elements. This 
provides a mechanism that allows wave fronts to move from out to in but not 
visa- versa. 



3.1 Static Rippling 

We will refer to the rippling mechanism described by Bundy et al. [5], as static 
rippling. In this, measure decreasing annotated rewrite rules, called wave rules, 
are generated from axioms and theorems before rippling is performed. These 
wave rules are then applied blindly to rewrite the goal. If, at some point in the 
proof, no wave rules apply and the goal cannot be fertilised, then the goal is 
said to be blocked. This typically indicates that some backtracking is required, 
or that a lemma is needed. 

In static rippling, annotations are expressed at the object level by inserting 
object level function symbols (identity functions) for wave fronts and wave holes. 
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For example, the function symbols wfout, wfin and wh may be used to represent 
outward wave fronts, inward wave fronts and wave holes respectively. The anno- 



tated term p( g{c) 



), for instance, can be represented using p(wfout(g(wh(c)))). 



Many wave rules can be created from a single theorem - in general, an exponen- 
tial number on the size of the term. However, once wave rules are generated, fast 
rule selection can be performed by using discrimination nets [7], for example. 

We now present a simple example of static rippling that considers the step 
case in an inductive proof of the commutativity of addition (a + b = b + a) in 
Peano arithmetic. We will use the following wave rules: 




( 1 ) 

( 2 ) 



A rippling proof of the step case uses the inductive hypothesis as the skeleton 
with which to annotate the goal: 



Suc{a) -I- [6J = [b\ + Suc{a) 



Ripple using wave rule: 1 




Fertilise using the inductive hypothesis. 
Suc(b -I- a) = Suc(b + a) 



This shows how rippling can be used to guide a proof by induction. A formal 
account for static rippling in first order logic has been developed by Basin and 
Walsh [1]. They observe that if the normal notion of substitution is used, then it 
is possible for rewriting to produce strange annotations that do not correspond to 
the initial skeleton. The resulting effect is that rippling may no longer terminate 
but, even if it does so successfully, due to the changed skeleton, fertilisation may 
not be possible. 

For an example of incorrect annotation consider the following: 




) which has the skeleton g{k{z))) 
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3. 



The goal rewrites to 




have a well defined skeleton. 



which does not even 



To avoid these problems, Basin and Walsh provide and use a modified notion 
of substitntion for their calculus of rippling. If such an approach were taken when 
working in a theorem prover snch as Isabelle or HOL, where any extra-logical 
work must be verified within the logical kernel, then rippling steps would have 
to be repeated by the theorem prover once rippling is successful. 



4 Dynamic Rippling with Embedding 

An alternative approach to annotations for rippling is taken by Smaill and 
Green [19], and used to automate proofs in the domain of ordinal arithmetic 
by Dennis and Smaill [8]. Their approach avoids the need for a modified notion 
of substitntion by recomputing the possible annotations each time a rule is ap- 
plied. We call this dynamic rippling. The key feature of dynamic rippling is that 
the annotations are stored separately from the goal and are recomputed each 
time the goal is rewritten. 

The central motivation for dynamic rippling, as noted by Smaill and Green, 
arises from problems with object level annotations when working in the lambda 
calculus. In particular: 

— object level annotations are not stable over beta redaction. In particular, if 
the wave fronts are expressed at the object level, then it is not possible to 
nse pre-annotated rules as they may not be skeleton preserving after beta 
reduction. 

— in a context with meta variables, incorrect annotations can accidentally be 
introdnced by unification. 

In the setting of the lambda calculus, it is not clear how beta reduction 
could be redefined to get the desired properties for rippling. Furthermore, we 
are interested in a generic approach to rippling that can be used across logics 
without redefining snbstitntion. 



4.1 Embeddings for Annotating Difference 

Smaill and Green use embedding trees to represent the difference annotations 
used in rippling [19]. However, their work leaves a number of open questions 
regarding what direction to give wave fronts in an embedding, and what to do 
when the skeleton can be embedded into the goal in more than one way. 

Additionally, we observe that the embedding of a bound variable is not re- 
stricted by its associated quantifier. For example, an embedding is possible from 
the term '^x.3y.P{x,y) into '^a.3b.'^c.P{a,c), where the y is existentially qnan- 
tified in the skeleton, bnt embedded into c which is universally quantified. We 
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believe that this is due to the lack of a well defined relationship between the 
annotations for difference and the underlying semantics. However, in practice 
this is rarely an issue and we have not found any domains where this causes a 
problem. 

Nonetheless, if rippling is to be used in a domain with many different quan- 
tifiers then it may be worthwhile to impose further restrictions on embeddings. 
For example, by requiring, for each quantified variable being embedded, that the 
quantifier in the skeleton and the goal should be identical, or that the quantifier 
in the skeleton should embed into the quantifier in the goal. Such constraints 
would prune the search space and bring a closer semantic relationship between 
the embedding of bound variables and their quantifiers. 

5 Rippling in IsaPlanner 

We now describe our version of rippling and its treatment of multiple annota- 
tions. We use dynamic rippling which avoids redefinition of substitution and is 
suitable for use in higher order logics. Before rippling starts, theorems and ax- 
ioms are added to a wave rule set which will be used during the process. We 
do not use all theorems and axioms during rippling for reasons described in 
Section 5.2. 

Given a wave rule set, our version of rippling is composed of three parts: 

1. Setup: Rippling is given a skeleton with which to create an initial list of 

possible annotations. We use the contextual information of IsaPlanner to 
store the annotations for rippling and keep track of the associated goal. This 
information also facilitates the later development of proof planning critics 
that can use the annotations to patch failed proof attempts, as described in 
the work of Ireland and Bundy [12]. 

2. Ripple Steps: Theorems in the wave rule set are used to perform a single 

step of rewriting on the goal. Note that the order in which the rules are ap- 
plied is irrelevant as the rewriting process is guided by the rippling measure. 
After each successful rule application, the goal is beta-reduced and a new set 
of annotations is created. If this set is empty then the rewrite is considered 
to be an invalid ripple step and another rule is tried. 

3. Fertilisation: When no more rules apply, rippling has either completed suc- 

cessfully, allowing fertilisation, or failed. Upon failure, our version of rippling 
either applies a proof critic, discussed in Section 6.1, or backtracks and tries 
rippling with different wave rules. 

We note that in general, the open problems with dynamic rippling arise be- 
cause there are many ways to embed a skeleton in a goal and, for each embedding, 
there are a number of ways in which it can be annotated. Thus each goal is as- 
sociated with a set of annotations, rather than a single annotation, as was the 
case in static rippling. Further problems arise when rippling inward, computing 
the measure, and when deciding which rules to use for rippling. In the follow- 
ing subsections we describe how our version of dynamic rippling addresses these 
issues. 
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5.1 Depth for Measures and Inward Rippling 

To avoid a large number of possible annotations, inward wave fronts are typi- 
cally restricted to being placed above a subterm that contains a sink. However, 
in higher order abstract syntax (HOAS) the idea of ‘above’ or ‘below’ is not 
immediately obvious as function symbols are leaf nodes in the term tree. We 
address this by defining a suitable notion of depth which removes the need for 
product types as used by Smaill and Green [19]. An advantage of our approach 
is that users are free to use a curried representation with a notion of measure 
similar to that used in first order static rippling. 

The central idea is to treat depth in the following way: If x has depth d in 
the term u, then x has depth d in Xy.u and app{u, v) (the HOAS application of 
u to v), and in app{v,u), x has depth d+1. This ‘uncurries’ the syntax in the 
way we would expect: no height ordering is given to different curried arguments 
of a function. For example, the term Suc{a) -\- b, expressed in the HOAS as 
app{app{+, app(Suc, a}), b), gives a depth of 0 to -I-, 1 to Sue and b, and 2 to a. 
In contrast, the usual notion of depth in HOAS is 1 for b, 2 for -I-, and 3 for Sue 
and a. 



5.2 Selection of the Wave Rule Set 

It is often cited as one of the advantages of rippling that the annotation process 
provides a means of ensuring termination and that therefore all resulting rules 
can be added to the set of wave rules. In static rippling, only measure decreasing 
wave rules are created. This avoids rewrites which have no valid annotation such 
as re 0 -I- re. 

However, recall that in dynamic rippling, theorems are used to rewrite the 
goal and then the possible annotations are checked in order to avoid goals where 
the measure does not decrease. Unfortunately, this approach can causes rules 
that are not beneficial but frequently applicable, such as rr 0 -I- x, to slow 
down search. 

To avoid this, we filter the possible ways a theorem can be used to write a 
goal, removing those with a left hand side that is identical to a subterm of the 
right, such as x x -I- 0. We also remove any rewrites that would introduce a 
new variable, such as 1 x°. While this solution does not correspond exactly 

to the first order case, it works well in practice. 



5.3 A Richer Representation of Annotations 



Smaill and Green represent annotations using embeddings. However, this does 
not correspond directly to the first order account of rippling annotations given 



by Basin and Walsh. In particular, annotations such as 



/( 9{x) ) 



cannot be 



expressed with their embedding representation. 

In order to maintain a flexible and efficient mechanism for annotated terms, 
we use a different representation (shown in Fig 1) that holds more information 
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aterm = aAbs{type,aterm,annot) 

I aApp(aterm, aterm, annot) 
I aConst(Const, annot) 

I aVar{Var, annot) 

I aBound{Bound, annot) 



Fig. 1. A datatype to express annotated terms (aterm). The types: annot expresses 
an annotation, which is typically either in, out or none; type is the type of a bound 
variable; Const is a constant, Far is a variable, and Bound is a bound variable using 
de Bruijn indices. 



than the embedding trees used by Smaill and Green^. This allows multiple ad- 
jacent wave fronts with different orientations. Using our annotations, the above 
example would then be expressed as 

aApp(aConst(f,out),(aApp(aConst(g,in),aVar(x,none),in)),out). 

The extra information held in this representation provides an easy way to 
experiment with different measures and mechanisms for annotation. Addition- 
ally, combined with the depth mechanism described in the previous section, our 
version of annotated terms produces the measures similar to first order rippling 
even when working with curried style functions. 



5.4 Choices in the Direction of Wave Fronts 



Whether using Smaill and Green’s embedding mechanism or our annotated 
terms, one still has to worry about the direction of wave fronts. Initially, they 
are always outward but after applying a rule there is a choice of direction for 
each wave front. 

For example, returning to the proof the commutativity of addition, the initial 



annotated goal is Suc{d) -I- [6J = [6J -I- Suc(a) , but after applying the 



theorem Suc(x) + y = Suc(x -\- y) from left to right, there are two possible ways 
the new goal can be annotated: 



Suc(a -I- [&J ) = [6J -I- Suc(a) 



Suc(a -b L^J ) = L^J + Suc(a) 



( 3 ) 

( 4 ) 



^ Note that IsaPlanner uses a more efficient but more complex datatype that main- 
tains the same information as the one presented here. 
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Note that the static account of rippling only allows inward wave fronts where 
there is a sink below the wave front (in the term structure). Without this re- 
striction, as needed by some of the proof critics in XClam, there are many more 
possible annotations. 

We observe that in order to manage the multitude of annotations, only a 
single measure needs to be stored. We call this the threshold measure. Initially, 
this is the highest measure in the ordering. After a rule is applied, the new 
annotations are analysed to yield the highest measure lower than the current 
threshold. This becomes the new threshold. If no such measure can be found 
then search backtracks over the rules application. This strategy ensures that all 
possible rippling solutions are in the search space. 



5.5 Managing Multiple Annotations 



While only a single measure is needed to represent all annotations, we observe 
that the mere existence of multiple annotations for a goal can result in rippling 
applying unnecessary proof steps. For example, when trying to prove a -I- 0 = a 



in Peano arithmetic, we ai’rive at an annotated step case of 



Suc{a) 



+ 0 = 



Suc{a) , which we will rewrite with the theorem Suc{X) -\-Y = Suc{X -I- T), 



named add_Suc: 



Suc{a) 



+ 0 = 



Suc{g^ Measure : ([1, 1, 0], [0, 0, 0]) 



Ripple using add_Suc from left to right 



Sucj a -I- 0) = Suc{a) Measure : ([0,2,0], [0,0,0]) 



Ripple using add_Suc from right to left 



Suc{g^ 



+ 0 = 



Suc{g^ Measure : ([0, 1, 0], [0, 0, 1]) 



Ripple using add_Suc from left to right 



Sucj a + 0) = Sucja) Measure : ([0,0, 0], [0,2,0]) 



Fertilise using the inductive hypothesis. 

Sucja) = Sucja) 

This redundancy in rewriting steps is an important inefficiency for a number 
of reasons: the search space will be larger, the proofs found will be less readable, 
the proofs may be more brittle (have unnecessary dependencies), and when being 
used for program synthesis [13], for example, inefficient programs may be created. 
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While the number of redundant proof steps is smaller if inward wave fronts 
are restricted to occurring above a sink, the problem still manifests itself when 
there are multiple sinks and wave fronts. 

In the following section, we describe a general inefRciency with rippling, and 
then present a solution that prunes the search space and thereby addresses the 
problem described in this section and the more general inefficiency. 



5.6 Avoiding Redundant Search in Rippling 

A simple observation which can be made during rippling is that it is often possible 
to ripple many different parts of a goal independently, and thus it is of no 
help to backtrack and try a different order. For example, in the proof of the 
commutativity of addition presented earlier, either the right hand side or the 
left hand side can be rippled out first. 

In IsaPlanner, the goal terms during rippling are cached (without anno- 
tation), so that the same rippling state is not examined more than once. This 
removes symmetry in the search space, and thus provides an efficiency improve- 
ment. By using this mechanism to keep the shortest possible proof (in terms 
of ripple steps) we also significantly reduce the problems with redundant steps 
in rippling. This mechanism is provided by a generic search space caching in 
IsaPlanner. 



5.7 Implementation Details 

Rippling is encoded in IsaPlanner in two parts: a module, called the ripple 
state, that holds annotations associated with a goal, and the rippling technique 
which is defined in terms of the ripple state module. The notion of embedding is 
defined in a generic way in terms of Isabelle’s BOAS. Embeddings are used by 
the ripple state and transformed into a set of possible annotations. The ripple 
state module has two main functions: firstly, to set up a new state from a goal 
and skeleton that has an initial set of annotations, and secondly, to update a 
state given a new goal. 

The abstract interface for a ripple state allows us to use different annotation 
mechanisms without changing any of the code for the rippling technique. To 
implement a new form of rippling, only a new implementation of the ripple state 
module needs be created. Furthermore, IsaPlanner supports multiple versions 
of rippling simultaneously. This provides us with a framework to test and easily 
create variations of the technique. 

IsaPlanner provides an interactive interface that can be used to trace 
through the proof planning attempt. We remark that this was particularly use- 
ful for debugging the rippling technique as well as understanding the rippling 
proofs. 

A feature of using IsaPlanner is that it allows encoded techniques to auto- 
matically generate readable, executable proof scripts of the Isabelle/Isar style. 
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This is particularly beneficial when lemmas are speculated and proved as it pro- 
vides a form of automatic theory formation. For an example of a generated proof 
script see Section 7. 

6 A Technique Combining Induction and Rippling 

As mentioned earlier, the most common use of rippling is to guide inductive 
proof. Moreover, rippling is particularly suited to the application of proof crit- 
ics as the annotations provide additional information that can be used when 
searching for a way to patch a failed proof attempt. Indeed, we found that a 
combination of induction with rippling, Ireland’s lemma calculation critic [12], 
and Boyer-Moore style generalisation [3] provides a powerful tool for automa- 
tion. The technique starts an inductive proof and uses rippling to solve the step 
case(s). When rippling becomes blocked, the lemma speculation and generalisa- 
tion critics are applied. The base cases are tackled using Isabelle’s simplification 
tactic which is also combined with the lemma speculation and generalisation 
critics. 

The induction technique selects and applies an induction scheme based on 
the inductively defined variables in the goal. Although there are various ways to 
select the variable for induction, such as ripple analysis [17], we found that search 
backtracks quickly enough for the choice of variable to be largely insignificant in 
the domains we examined. This is partially due to the caching mechanism that 
allow proof planning to use a significant portion of the failed proof attempt. 
For example, when proving = p ■ i* in Peano arithmetic, wrongly trying 

induction on i results in the proof of 3 of the 4 needed lemmas, and the only 
additional lemma to prove is the trivial theorem x + 0 = x. 

This technique combining induction and rippling is similar to that used by 
Dennis and Smaill [8] in \Clam. The main differences are within rippling, where 
we use a different mechanism for annotation, and provide a number of efficiency 
measures. Additionally, we make use of Isabelle’s induction and simplification 
tactics as well as provide some further optimisation to lemma speculation as 
described below. In Section 8, we briefly compare our implementation with that 
in XClam. 

6.1 Efficient Lemma Conjecturing and Proof 

We have attached a lemma speculation and generalisation critic to rippling and 
incorporated the following efficiency measures into the speculation and proof of 
lemmas: 

- if a conjecture is proved to be false, then the search space of possible alterna- 
tive proofs should be pruned. Additionally, the search space of any conjecture 
of which the false one is an instance should also be pruned. At present our 
rippling technique does not use any sophisticated means of detecting false 
conjectures, although we intend to make use of Isabelle’s refutation and 
counter example finding tools in future work. 
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- if the search space for the proof of a conjecture is exhausted, then it seems 
reasonable (and is useful in practice) to avoid making the same conjecture 
at a later point in proof planning. 

— when a lemma is successfully proved, but later the proof of the main goal 
fails, it will not help to find alternative proofs for the lemma. This suggests 
that when a lemma is proved, the search space for other proofs of the lemma 
(or an instance of it) should be pruned. 

These are available in a generic form in IsaPlanner and can be used in 
any technique that speculates and tries to prove lemmas. We remark that using 
a global cache of proved lemmas is difficult in systems such as XClam where 
backtracking removes derived information. 



7 A Brief Case Study in Ordinal Arithmetic 

We now briefly describe a formalisation in Isabelle/ISAPLANNER of ordinal arith- 
metic similar to that developed in XClam by Dennis and Smaill [8]. Ordinal 
notation is defined using the following datatype: 

ordinal = 0 | Sue of ordinal \ Lim (nat ordinal) 

A feature of Isabelle is that the transfinite induction scheme for the ordinal 
notation is automatically generated by the datatype package [18]. The induction 
scheme is then automatically used by the induction technique in IsaPlanner. 

The arithmetic operations on ordinals are defined using Isabelle’s primitive 
recursive package. For example, addition is defined as follows: 

primrec 

ord_add_0 : "(x + 0) = (x ;; Ord)" 
ord_add_Suc : "x + (Sue y) = Sue (x + y) " 

ord_add_Lim : "x + (Lim f) = Lim (Xn. x + (f n))" 

The other arithmetic operations are defined and named similarly. Using these 
definitions, the induction and rippling technique is able to derive and produce 
automatically Isabelle/Isar proof scripts for all the theorems proved in the work 
of Dennis and Smaill. The theorem that takes longest to prove is the following: 

theorem "x ~ (y * z) = (x ~ y) " z" 
proof (induct "z") 

show "x ~ (y * 0) = (x ~ y) ' 0" by (simp) 
next 

fix Ord :: "Ord" 

assume ind_hypl: "x " (y * Ord) = (x " y) " Ord" 

have "x ~ (y * Ord + y) = x " (y * Ord) * x ~ y" by (rule auto_lemma_0) 

hence "x ~ (y * Ord + y) = (x ~ y) ' Ord * x ~ y" by (rwstep sym[0F ind_hyplj) 
hence "x " (y * Ord + y) = (x " y) " Sue Ord" by (rwstep ord_exp_Suc) 
thus "x " (y * Sue Ord) = (x ~ y) ~ Sue Ord" by (rwstep ord_mul_Suc) 
next 
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fix f : : "nat => Ord" 

assume ind_hypl: "!!xa. x " (y * f xa) = (x ~ y) " f xa" 
have "Lim (\n . (x ~ y) ~ f n) = Lim (\n . (x ~ y) ~ f n)" by (simp) 
hence "Lim (\n. x ~ (y * f n)) = Lim (\n. (x ~ y) ~ f n)" by (rwstep ind_hypl) 
hence "Lim (Xn. x ~ (y * f n)) = (x ~ y) " Lim f" by (rwstep ord_exp_Lim) 
hence "x ~ Lim (Xn. y * f n) = (x ~ y) " Lim f" by (rwstep ord_exp_Lim) 
thus "x ~ (y * Lim f) = (x ~ y) " Lim f" by (rwstep ord_mul_Lim) 
qed 



where ord_exp_Suc, ord_exp_Lim, ord_tnul_Suc and ord_nml_Lim are the names 
of the defining equations in the recursive definitions for exponentation and mul- 
tiplication. Also note that the following needed lemmas are all automatically 
conjectured and proved: 



lemma auto_lemma_5 
lemma auto_lemma_4 
lemma auto_lemma_3 
lemma auto_lemma_l 
lemma auto_lemma_0 



"gO + (g2 + gl) = gO + g2 + gl" 

"gl * g2 + gl * gO = gl * (g2 + gO)" 
"gl * gO * X = gl * (gO * x)" 

"gl = 0 + gl" 

"x ~ (gO + y) = X ~ gO * X ~ y" 



As a final remai’k, note that in the automatically generated Isar script above, 
the tactic rwstep simply applies a single step of rewriting with the given theorem. 



8 Results 

We have applied our technique with depth first search to over 300 problems in a 
mixture of first and higher domains, including a theory of lists, Peano arithmetic, 
and ordinal arithmetic. A table highlighting some of the results is given in Fig 2. 

To distinguish the automation provided by the rippling technique from that 
gained by working in the richly developed theories of Isabelle, the tests were 
carried out in a formalisation without any auxiliary lemmas. All needed lemmas 
were automatically conjectured and proved. To get an idea of the improved 
automation, we note that none of the theorems shown in Figure 2 are provable 
using Isabelle’s existing automatic tactics, even after the manual application of 
induction. 

As a comparison with XClam we observe that: 

— XClam has specialised methods for various domains, such as non-standard 
analysis [14], which provide it with the ability to prove some theorems not 
provable by IsaPlanner’s default rippling machinary. 

— IsaPlanner makes use of Isabelle’s configurable tactics such as the sim- 
plifier which is user configurable and can be used to provide conditional 
rewriting for the base cases of inductive proofs. This can provide IsaPlan- 
ner with automation not possible in XClam. 

— IsaPlanner executes the proof plan, ensuring soundness of the result, where 
XClam is currently not interfaced to an object level theorem prover. 
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Time 


Lemmas 


Domain 


Theorem 


(in seconds) 


Proved 


Properties 


length 1 = length{rev 1) 


0.2 


1 


of Lists 


length(xs <8i ys) = length{xs) + length{ys) 


0.3 


1 




rev(map f xs) = map f rev(xs) 


0.3 


1 




rev{rev{xs)) = xs 


1.0 


1 


Peano 


a ■ b = b ■ a 


0.1 


3 


Arithmetic 


(a ■ b) ■ c = a ■ (b ■ c) 


1.6 


8 




aib+c) ^ 


2.0 


11 




a ■ (b ■ c) = b ■ (a ■ c) 


2.5 


15 


Ordinal 


X- (y + z) = {x-y) + (x- z) 


0.8 


1 


Arithmetic 


(a • b) • c = a • {b • c) 


1.0 


2 




j;(V+0 — J.V . 


1.6 


4 






2.0 


5 



Fig. 2. Some results using the induction and rippling technique in IsaPlanner showing 
the theorem proved, the time taken, and number of lemmas conjectured and proved 
automatically. The timings were obtained from a 2GHz Intel PC with 512MB of RAM, 
and using Isabelle2004 with PolyML. 



— Higher order rippling in IsaPlanner is appears to be exponentially faster 
than in XClam. Simple theorems are solved in almost equivalent time but 
those with more complex proofs involving lemmas are significantly quicker to 
plan and prove in IsaPlanner. For example, the ordinal theorem = 
(x^y takes over five minutes in XClam compared to 2 seconds in IsaPlan- 
ner. We believe that this is largely due to the efficiency measures described 
in this paper. 

— The resulting proof plans from IsaPlanner are readable and clear whereas 
those produced by XClam are difficult to read. For example, at present the 
proof plan generated by XClam for the associativity of addition in Peano 
arithmetic is 12 pages long (without any line breaks). The proof script gen- 
erated by IsaPlanner is one page long and in the Isar style. 

— Upon failure to prove a theorem, XClam does not give any helpful results, 

whereas IsaPlanner is able to provide the user with proofs for useful aux- 
iliary lemmas. For example, upon trying to prove = {x^Y in Peano 

arithmetic, IsaPlanner conjectures and proves 13 lemmas, including the 
associativity and distributivity rules for multiplication. 

We remark that many of the automatically conjectured and proved lemmas 
can be obtained by simplification from previously generated ones. This shows 
a certain amount of redundancy in the generated lemmas. In future work, we 
intend to prune these and identify those which are of obvious use to the simplifier. 
Future work will also include support for working with theorems that do not 
contain equalities. 
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9 Related Work 

Boulton and Slind [2] developed an interface between Clam and HOL. Unlike our 
approach which tries to take advantage of the tactics in Isabelle, their interface 
did not use the tactics developed in HOL as part of proof planning. Additionally, 
problems were limited to being first order, whereas our approach is able to derive 
proof plans for higher order theorems. 

A general notion of annotated rewriting has been developed by Hutter [10] 
and extended to the setting of a higher order logic by Hutter and Kohlhase [11]. 
They develop a novel calculus which contains annotations. This is a mixture 
between dynamic and static rippling as after each rewrite skeleton preservation 
still needs to be checked, but the wave rules can be generated beforehand. 

A proof method that combines logical proof search and static rippling has 
been implemented for the NuPrl system by Pietntka and Kreitz [16]. Their im- 
plementation is as a tactic without proof critics and focuses on the incremental 
instantiation of meta variables. They employ a different measure based on the 
sum of the distances between wave fronts and sinks. 

10 Conclusions Further Work 

We have presented an account of rippling, based on the dynamic style described 
by Smaill and Green and extended it to use annotations that bear a closer sim- 
ilarity to the account of static rippling within first order domains. Additionally, 
we have exposed and treated important issues that affect the size of the search 
space. This has lead to an efficient version of rippling. 

We have implemented our version of rippling in IsaPlanner for use in the 
higher order logic of Isabelle. This provides a framework for comparing and 
experimenting with extensions to rippling, such as the addition of proof critics 
and the use of modified measures. We believe that this is an important step in 
the development of a unified view of this proof planning technique. 

Our version of rippling, combined with induction, lemma speculation, and 
generalisation gives improved automation in Isabelle, can generate Isar proof 
scripts and is able to conjecture and prove needed lemmas. This work also serves 
as a test-bed for the IsaPlanner framework and facilitates the application of 
proof planning techniques to interactive higher order theorem proving. 

There are many ways in which this work can be extended. It would be in- 
teresting to experiment with various mechanisms for annotation and develop a 
complete picture of the effect of the design choices for dynamic rippling. This 
would work towards a complete and formal account of dynamic rippling for a 
higher order setting. In terms of proof automation, there are many proof critics 
that could be added to our implementation and compared. This would provide 
further automation and test the flexibility of our framework. It would also be 
interesting to compare rippling with the existing simplification package in Is- 
abelle. Additionally, we would like to examine the automation that rippling can 
provide to the various large ‘real world’ theory developments in Isabelle. 
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Abstract. As is the case with many theorems in complexity theory, 
typical proofs of the celebrated Cook-Levin theorem showing the NP- 
completeness of satisfiability are based on a clever construction. The 
Cook-Levin theorem is proved by carefully translating a possible com- 
putation of a Turing machine into a boolean expression. As the boolean 
expression is built, it is “obvious” that it can be satisfied if and only if 
the computation corresponds to a valid and accepting computation of the 
Turing machine. The details of the argument that the translation works 
as advertised are usually glossed over; it is the translation itself that is 
discussed. In this paper, we present a formal proof of the correctness of 
the translation. The proof is verified with the theorem prover ACL2. 



1 Introduction 

This paper presents a mechanical proof of the Cook-Levin theorem. A number 
of reasons led us to this investigation. The Cook-Levin theorem is the central 
theorem in NP-completeness theory, as it was the first to demonstrate the exis- 
tence of an NP-complete problem, namely satisfiability [3,7]. Moreover, having 
taught several undergraduate and introductory graduate courses on the theory 
of computer science, one of the authors has always been uncomfortable with the 
format of most proofs in the field. Many such proofs hinge on an algorithm that 
translates an instance of a problem from one domain to another. The trans- 
formation can be quite intricate, but seldom is its correctness actually proved. 
More often the correctness of the transformation is left as being obvious. Since 
the correctness proof is almost certainly tedious, we see it as an opportunity for 
formal approaches to proof. Other efforts have used the Boyer-Moore theorem 
prover to prove similar results, such as [1,2, 8, 9]. 

We chose to use the theorem prover ACL2 for our formalization. ACL2 is a 
theorem prover over a first-order logic of total functions, with minimal support 
for quantifiers. The logic of ACL2 is based on the applicative subset of Common 
Lisp. Its basic structure and inference mechanisms are taken from its predecessor, 
the Boyer-Moore theorem prover. In fact, ACL2 arose out of a desire to enhance 
the Boyer-Moore prover to make it more suitable for industrial use, and in that 
respect it has succeeded marvelously. For example, it has been used to verify 
aspects of the floating-point units of microprocessors at AMD and IBM, and it 
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is also in use in the simulation and verification of microprocessors at Rockwell- 
Collins. ACL2 has also been used in many other verification projects, ranging 
from the algebra of polynomials to properties of the Java virtual machine [5, 6]. 

This paper does not assume familiarity with ACL2. However, we will use 
regular ACL2 syntax to introduce ACL2 definitions and theorems. We assume, 
therefore, that the reader is comfortable with Lisp notation. 

The remainder of the paper is organized as follows. In Sect. 2 we present 
an informal proof of the Cook-Levin theorem, such as the one found in many 
introductory texts. We formalize this proof in Sect. 3. The formalization in ACL2 
will follow the constructive parts of the informal proof quite closely. In Sect. 4 
we present some final thoughts and some directions for further research. 

2 An Informal Proof 

We assume the reader is familiar with Turing machines and the NP-completeness 
of satisfiability. In this section we present an informal proof of this fact, merely 
to fix the terminology and lay the foundation for the formal proof to come later. 
Our exposition follows [4] quite closely. There are other proofs of the Cook- 
Levin theorem, some more recent and easier to follow. We chose this particular 
exposition because we considered it to be the most amenable to mechanization. 

Informally, a Turing machine consists of a single tape that is divided into 
an infinite number of cells. The tape has a leftmost cell but no rightmost cell. 
The machine has a read/ write head that can process a single cell at a time. The 
head can also move to the left or the right one step at a time. The behavior of 
the machine is governed by a finite control, with transitions based on its current 
state and the tape symbol being scanned. More formally a Turing machine M = 
{Q, S,S,qo,qf) where Q is a finite set of states including qo and qj, E is the 
finite alphabet of the tape not including the special blank symbol B, and i5 is a 
relation mapping a state and a symbol into a possible move. The states qo and 
< 7 / are called the initial and accepting states of M, respectively. 

The Cook-Levin theorem shows the relationship between Turing machines 
and satisfiability: 

Theorem 1 (Cook, Levin). Let M he a Turing Machine that is guaranteed to 
halt on an arbitrary input x after p{n) steps, where p is a (fixed) polynomial and 
n is the length of x. L{M), the set of strings x accepted by M, is polynomially 
reducible to satisfiability. 

Consider the behavior of machine M on input x. Initially, the tape contains 
the input x followed by blanks, the head is scanning the first symbol of x, and 
the machine M is in its initial state go- The machine goes through a sequence 
of steps, each of which is characterized by the contents of the tape, the position 
of the head, and the internal state of the machine. After at most p{n) steps, 
the machine halts. If it halts while in state qj the machine accepts input x, and 
otherwise it rejects x. 

So a computation of the machine can be formalized as the sequence of steps 
S'o, S\, . . . , Sp(n)i where Sq corresponds to the initial configuration of the ma- 
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chine with input x, and each S'i+i follows from Si according to the rules of the 
machine M . As a matter of convenience, if the machine halts before p{n) steps, 
we let the last step repeat so that we always end up with p(n) -I- 1 steps. 

The step Si can be represented by its tape, the location of the head, and the 
internal state of the machine M. It is also helpful to store explicitly the move 
taken by M from state Si to Si+\. The tape can hold at most p{n) characters, 
because it takes at least one step to write a character. We may assume that all 
tapes have exactly p{n) characters, simply by padding the tapes with blanks on 
the right. Thus the tapes in the computation can be represented by the two- 
dimensional array T{i,j) where i G [0,p(n)] is the step of the computation and 
j G [l,p(n)] is the position of the character in the tape. We will use the notation 
T{i, *) to refer to all the cells in a single step of the computation. 

To complete the representation of a computation, we need only represent the 
position of the head and the machine state at each step of the computation, as 
well as the moves taken between steps of the computation. A convenient way 
to do this is to encode this information in the array T. The value of T(i,j) is 
normally a symbol in the tape. But if the head is at position j at step Si, then 
T{i,j) is the composite symbol {c,q,m), where c is the character in position j 
of the tape, q is the state of the machine at step Si, and m is the move taken by 
the Turing machine from step Si to step S'i+i. 

The transformation to satisfiability is carried out using this data structure. 
It is clear that the value of T{i,j) is in T = AU {B} U (AU {B}) x Q x img{S). 
For each i G [0,p(n)], j G [l,p(n)], and A G T we define the proposition Cij^x 
with informal meaning T(z,j) = X. The expression over these variables is 
the conjunction of the following four subexpressions: 

— The truth assignment really does represent a unique array T(i,j). That is, 
for each i and j precisely one of the Cij^x is true. 

— The values in T(0, *) correspond to the initial configuration of the machine 
with X in the input tape. 

— The machine is in its final accepting state qf in T{p{n), *). 

— For each i G [l,p{n)], the configuration represented by T(i,*) follows from 
the configuration at T(f — 1, *). 

Taken together, these expressions are satisfiable if and only if there is some valid 
computation of M that starts with the input x and ends in an accepting state. 

3 A Formal Proof 

3.1 The Turing Machine Models 

As there are many variants of Turing machines, it is important to specify pre- 
cisely which variant we are using. Our Turing machines have a semi-infinite tape 
that is allowed to grow without bounds but only to the right. The input is placed 
at the beginning of this tape. The machine has a single initial and a final state. 
Once the machine enters the final state, it is constrained to stay there. 
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We encode a specific Turing machine in a data structure that contains the 
machine’s alphabet, its set of states, the initial and final states, and the transi- 
tions specifying how the machine changes state. The transitions are encoded 
as a list mapping state/symbol pairs into a list of possible moves. We use 
the functions ndtm-alphabet, ndtm-states, ndtm-initial, ndtm-f inal, and 
ndtm-transition to select individual components from a Turing machine. 

A configuration stores the information about a single step in the computation: 
The current contents of the tape, the position of the read/write head, and the 
current internal state of the machine. To make traversals of the tape easier, 
we split the tape into two halves. The right half of the tape begins with the 
symbol currently being scanned by the head; its remaining elements contain all 
the symbols to the right of the head in increasing order. The left half of the tape 
contains all the symbols to the left of the head in reverse order. The functions 
conf ig-lhs, conf ig-rhs, and conf ig-state will be used to access the members 
of this structure. 

The basic mechanics of the Turing machine are modeled by the function 
ndtm-step, which takes in a Turing machine and a configuration and returns all 
the possible configurations that may follow it: 

(defun ndtm-step (machine config) 

(let ((moves (ndtm-moves (conf ig-state config) 

(first (conf ig-rhs config)) 
(ndtm-transition machine)))) 
(ndtm-step-with-move-list config moves))) 

The function ndtm-moves returns all the valid transitions that the Turing ma- 
chine can make when it is at the given state and looking at the given symbol on 
the tape. The function ndtm-step-with-move-list applies the selected moves 
to the configuration, returning a list containing all the resulting configurations. 

We can not allow a machine to move the read/ write head to the left when it is 
in the first cell position. This is enforced in the function ndtm-step-with-move 
(called by ndtm-step-with-move-list). When the head attempts to move past 
the beginning of the tape, we leave the head scanning the first cell of the tape. 

We use a breadth-first strategy to model the non-determinism of the Turing 
machine. Using this search strategy allows us to decouple the search from the 
acceptance check. The function ndtm-step-n returns all the possible configura- 
tions that can occur after stepping through an initial configuration n times. We 
test acceptance with the function ndtm-accept which takes a list of configura- 
tions and checks to see if any of them are in the accepting state. The function 
ndtm-accepts-p takes a machine, input, and number of steps, and returns true 
if the machine accepts the given input in that number of steps. 

We place some restrictions on the Turing machines: We insist that once a 
machine enters its final state it should stay there; we require that a machine have 
some transition for every possible combination of internal state and tape symbol 
read; and we require some syntactic conditions, such as the initial and final states 
being listed in the possible states, and that each transition write a valid character 
in the tape and move to a valid state. These properties are encapsulated in the 
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predicate valid-machine. We chose to write this as restrictions on the possible 
Turing machines, rather than to enforce them in the function ndtm-step, to 
simplify the Turing machine model. 

The functions ndtm-step-n and ndtm-accept faithfully model traditional 
Turing machines. But the proof of the Cook-Levin theorem makes use of compu- 
tations, i.e., paths through the tree explored by ndtm-step-n. In contrast these 
functions only store the frontier of the tree, since Turing machines do not keep 
a “memory” of their previous states. To bridge this gap, we introduced another 
model of Turing machines, one based on computations instead of configurations. 

A computation consists of a sequence of configurations and the Turing ma- 
chine transitions or moves that link them together. Consider the sequence Si, S2, 

. . . , S'n of configurations, and further let rrn be the move that transforms Si-i 
into Si- Then we represent this with the list ( (S'„ . m„) (S'„_i . m„_i) 

. . . ( Si, nil ) ) . Notice that the list contains the last (or current) configura- 
tion in the front, making it easier to extend recursively. 

The functions ndtm-comp-step-n and ndtm-comp-accept are direct analogs 
of ndtm-step-n and ndtm-accept. In particular, we use the exact same search 
strategy in the ndtm-comp-* functions as we do in the ndtm-* functions. This 
simplifies the proof of the equivalence between the two Turing machine models. 

3.2 The Model of Satisfiability 

Boolean expressions are considerably simpler than Turing machines. We must 
make clear that by “boolean expression” we mean any expression made up of 
propositional variables and the connectives “and,” “or,” and “not.” In particular, 
we do not restrict ourselves to clausal representation. 

What we need to model boolean expressions is an interpreter that can input 
arbitrary expression trees over and, or, and not as well as a list associating 
variables with values, and return the value of the expression. We defined the 
interpreter booleval that fits this description. For example, the expression 

(booleval ’(and (or p q) (not r)) ’((r . nil) (p . t) (q . t))) 

returns true, i.e., t. In addition, we proved a number of simple theorems about 
booleval, such as the following: 

(defthm booleval-and 

(implies (equal (first x) ’and) 

(equal (booleval x a) 

(and (booleval (second x) a) 

(booleval (third x) a))))) 

For the remainder of the proof, the actual definition of booleval was irrelevant 
and in fact disabled. Only properties such as the above were used in the proof. 

3.3 The Translation 

In this section, we describe the formal translation from a Turing machine instance 
into satisfiability. Rather than presenting the complete translation, we will focus 
only on the functions that will be needed in the formal proofs to follow. 
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Recall that the boolean expression Ex consists of the conjunction of four 
parts, with rough semantics equal to “the assignment consistently represents an 
array,” “the first configuration is the initial configuration for the input,” “the 
final configuration is an accepting configuration,” and “successive configurations 
follow each other legally, according to the rules of the machine.” Formally we 
build this expression as follows: 

(defun ndtm2sat (machine input nsteps ncells) 

(let ((alphabet (ndtm2sat-alphabet machine) ) ) 

(fold-and (is-a-2d-array 1 nsteps ncells alphabet) 

(f irst-string-is-input ncells input machine) 
(last-string-accepts nsteps ncells machine) 

(valid- computation nsteps ncells machine)))) 

The function ndtm2sat-alphabet builds the alphabet of the array T(i,j). This 
includes not just the alphabet of the Turing machine, but also all the composite 
symbols {x,q,S) encoding a tape symbol, a state, and a legal move. Note: The 
function fold-and returns an expression corresponding to the conjunction of its 
arguments. 

We will now consider each subexpression, starting with is-a-2d-array. This 
function loops over all steps making sure each one is a valid string: 

(defun is-a-2d-array (step nsteps ncells alphabet) 

(declare (xargs :measure (nfix (1+ (- nsteps step))))) 

(if (or (not (integerp nsteps)) (not (integerp step)) 

(> step nsteps)) 
t 

(list ’and 

(is-a-string step 1 ncells alphabet) 

(is-a-2d-array (1+ step) nsteps ncells alphabet)))) 

The : measure is used to justify the termination of this function. All ACL2 
functions are total, so ACL2 tries to prove a function terminates before accepting 
it. When the termination argument is non-obvious, it is necessary to provide an 
explicit : measure that ACL2 can use in the termination proof. 

The definition of is-a-string is just like that of is-a-2d-array, except that 
it iterates over the function is-a-character, which returns a boolean expression 
that is sasisfiable precisely when there is exactly one character at position 

(defun is-a-character (step cell alphabet) 

(list ’and 

(is-one-of-the-characters step cell alphabet) 

(is-not -two-characters step cell alphabet))) 

Checking that the symbol is one of the characters is straightforward. We need 
only iterate over the alphabet and take the disjunction of all terms (prop step 
cell X) where X is one of the members of alphabet. Similarly, to make sure 
there are not two different characters at this position, we consider each member 
of alphabet against each of the remaining elements of alphabet: 
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(defun is-not-2nd-character (step cell char alphabet) 

(if (endp alphabet) 
t 

(list ’and 

(list ’not 

(list ’and 

(prop step cell char) 

(prop step cell (first alphabet)))) 
(is-not-2nd-character step cell char (rest alphabet))))) 

(defun is-not-two-characters (step cell alphabet) 

(if (or (endp alphabet) (endp (rest alphabet))) 
t 

(list ’and 

(is-not-2nd-character step cell (first alphabet) 

(rest alphabet)) 

(is-not-two-characters step cell (rest alphabet))))) 

A similar story explains f irst-string-is-input. We already know what 
the input should be, so we need only check that the appropriate propositions are 
true. The function string-holds-values performs such a check. Given a step, 
a beginning and end tape position, and a list, it creates a conjunction specifying 
that the tape holds the characters in the given list. The only complication is that 
the first character is actually a composite symbol, so we do not know exactly 
which proposition will be true. This forces us to iterate over all legal moves when 
the machine is in its initial state and scanning the first character of the input. 

The function last-string-accepts is also quite simple. It iterates over all 
the cells in a given step, checking to see if that cell is one of the elements of the 
final alphabet. 

Not surprisingly, valid- computation is the hardest part of the translation. 
The function iterates over successive steps checking that the second follows from 
the first. We do this by looping over each of the cells in the second tape, making 
sure that it is correct. The difficulty lies with validating a single cell. 

We use the function valid-cell to perform this check. A minor difficulty 
has to do with boundary conditions. The cell T(i,j) depends on the values of 

— — T{i—l,j), and T{i + l,j), which we call the neighbors of T{i,j). 

But when the cell j is at the beginning or end of the tape, we must drop the 
neighbors that lie outside the edges. This also prevents the read/ write head 
from scanning past the left edge of the tape. So valid-cell performs a case 
split to check the position of the cell. We will avoid this complication in this 
presentation, since it only splits the proof into four very similar cases. 

For a cell in the “middle” of the tape, there are four ways in which T(i,j) 
can follow from T{i — 1, *). First, it is possible that T{i —l,j— 1) is a composite 
symbol corresponding to a move of the read/ write head towards the right, in 
which case T{i,j) will become a composite symbol. A similar story holds if 
T{i — l,j+l) represents a move to the left. When T{i—l,j) is composite, then 
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T{i,j) will change as the machine will write a (possibly new) symbol on cell j. 
In all other cases, T{i,j) will retain the old value of T{i — 1, j). 

The functions valid-moves-left, -right, -middle, and -rest check for 
these cases. The first three functions scan the valid composite symbols to find the 
ones that may affect the symbol T(z, j). These functions return two values. The 
first value is the boolean expression that will be true if and only if T(t — 1, j') is a 
composite symbol resulting in T(i,j). The second value is a list of the composite 
symbols examined. This list is needed by valid-moves-rest, so it can perform 
the “else” case. To make this clear, consider the definition of valid-moves-left: 



(defun valid-moves-left (prevstep curstep curcell machine) 

(let* ((alphabet (cons nil (ndtm-alphabet machine))) 

(composites (strip-right 

(composite-symbols 
alphabet (ndtm-states machine) 
(ndtm-transition machine))))) 

(cons (make-valid-moves prevstep (1- curcell) 

curstep curcell 
composites alphabet machine) 

(prop-list prevstep (1- curcell) composites)))) 

This function handles the case where T(i— 1, j — 1) is a composite affecting 
The alphabet consists of the alphabet of the Turing machine and the designated 
blank symbol, which is represented by nil. The auxiliary function strip-right 
returns all the relevant moves, i.e., those that move to the right. The function 
prop-list stores the propositions representing the fact that T{i — l,j — 1) is 
a relevant composite symbol. The function make-valid-moves loops over the 
relevant composite symbols in T(z — 1, j — 1) and possible tape symbols in T(i,j) 
and calls make-valid-move to generate the given constraint. The definition of 
make-valid-move is given below: 



(defun make-valid-move (prevstep prevcell curstep curcell 

composite symbol machine) 

(let* ((newstate (move-nextstate (symb-move composite))) 
(moves (ndtm-moves newstate symbol 

(ndtm-transition machine)))) 



(list ’and 

(prop prevstep prevcell composite) 

(prop prevstep curcell symbol) 
(make-valid-move-list curstep curcell 

newstate symbol moves)))) 



This function depends on make-valid-move-list which loops over the given 
moves and generates the appropriate composite symbol: 



(defun make-valid-move-list (curstep curcell state symbol moves) 
(if (endp moves) 
nil 
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(list ’or 

(prop curstep curcell (symb symbol state (first moves))) 
(make-valid-move-list curstep curcell state 

symbol (rest moves))))) 

The function valid-moves-right is completely symmetrical; in fact, it uses 
many of the same auxiliary functions. The function valid-moves-middle is also 
very similar, but it is slightly more complicated because it takes into account 
the new symbol written by the machine. That leaves valid-moves-rest which 
handles the else case. That is, if a given cell is not affected by a neighboring 
composite symbol, then it retains its previous value: 

(defun valid-moves-rest (prevstep curstep curcell machine cases) 
(list ’and 

(list ’not (fold-or cases)) 

(remains-unchanged prevstep curstep curcell 

(cons nil (ndtm-alphabet machine))))) 

The list cases contains all of the propositions encoding neighboring composite 
symbols. This is compiled from the second value of the other valid-moves-* 
functions. The function remains-unchanged iterates over the given alphabet 
making sure that T{i — 1, j) = T(i,j) and is a member of the alphabet. 

Note in particular that remains-unchanged is called only for characters that 
are part of the real alphabet of the tape, i.e., the machine alphabet and the 
special blank character. No composite symbols are ever passed through this 
function, since the composite symbols always change according to the rules of 
the valid-moves-* functions. 

All of these constraints come together in valid-cell, which ties these func- 
tions while taking care of the special cases. The following excerpt will suffice to 
show how this function operates: 



(defun valid-cell (prevstep curstep curcell ncells machine) 

(if (> curcell 1) 

(if (< curcell ncells) 

(let ((left (valid-moves-left prevstep curstep 

curcell machine)) 

(middle (valid-moves-middle prevstep curstep 

curcell machine)) 

(right (valid-moves-right prevstep curstep 

curcell machine))) 

(fold-or (first left) (first middle) (first right) 
(valid-moves-rest 
prevstep curstep curcell machine 
(append (rest left) (rest middle) 

(rest right))))))) 



. . .) 
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3.4 Case I: The Turing Machine Accepts 

In this section we will show that the boolean expression generated in Sect. 3.3 
is satisfied when the Turing machine accepts the input. The expression consists 
of the conjunction of four main subexpressions which we can consider in turn. 

Before delving into the details, it is worth a moment to look at the basic 
structure of the proofs. Suppose we have a valid computation of the machine 
accepting x. We want to show that a term (booleval expr alist) is true, 
where expr is and alist is a truth assignment generated from the accepting 
computation. The expr is constructed by piecing together a large number of 
local terms. For example, the subexpression for valid-cell will only examine 
propositions that correspond to neighboring cells. The alist is also constructed 
in this manner. We will process the computation and translate pieces of it into 
truth assignments which are then joined together. So the essence of the proof 
will be to dive into both expr and alist, such that we can show a particular 
subexpression exprl is true under the truth assignment alistl. Then we will 
“lift” this result to the complete truth assignment, so that exprl is satisfied by 
alist. Finally, we put together all the subexpressions to complete the proof. 

We begin our study of the proof with the extraction of a truth assignment 
from a computation. A computation is a list of configurations and the moves that 
link them together, and a configuration consists of a tape and a state. The most 
basic extraction function, therefore, converts a tape into a truth assignment: 

(defun convert-tape-to-assignment (tape step cell ncells) 

(declare (xargs :measure (nfix (1+ (- ncells cell))))) 

(if (or (not (integerp ncells)) (not (integerp cell)) 

(> cell ncells)) 

nil 

(cons (cons (prop step cell (first tape)) t) 

(convert-tape-to-assignment (rest tape) step (1+ cell) 

ncells) ) ) ) 

This is the only place where we will assign a value to a proposition; notice 
in particular that the only propositions assigned are given a true value. The 
following routine is used to extract an assignment from a configuration: 

(defun convert-conf ig-move-to-assignment (config move step ncells) 
(convert-tape-to-assignment 
(append (reverse (config-lhs config)) 

(cons (symb (first (config-rhs config)) 

(conf ig-state config) 
move) 

(rest (config-rhs config)))) 
step 1 ncells)) 

To finish the conversion of a computation to a truth assignment, it is only nec- 
essary to step over all the configurations in the computation and append the 
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resulting assignments. However, there is a slight complication. The computa- 
tions associate a configuration with the move that results in that configuration, 
while the composite symbols associate a state and symbol with the move that 
is possible from that configuration. So we must stagger the moves as we process 
them. In addition, we must explicitly find a possible (e.g., the first) move ex- 
tending the last configuration, since this is needed to form the composite symbol 
but it is not present in the computation. 

Now that the truth assignment is constructed, let us consider the proof that 
it represents a 2D array. We break the boolean expression down into its smallest 
terms and find the corresponding local section of the truth assignment. Recall 
how is-a-2d-array is defined in terms of is-a-string, all the way down to 
is-one-of-the-characters. So we begin by considering the latter function: 

(defthm tape-to-assignment-is-one-of-the-characters-aux 
(implies (member symbol alphabet) 

(booleval (is-one-of-the-characters step cell alphabet) 
(cons (cons (prop step cell symbol) t) 
alist) ) ) ) 

As the theorem shows, the simplest truth assignment that makes this expression 
true is one that begins with a boolean proposition corresponding to this partic- 
ular cell. As it turns out, the function convert-tape-to-assignment has just 
this property, so it is easy to show the following: 

(defthm tape-to-assignment-is-one-of-the-characters 
(implies (and (member (first tape) alphabet) 

(integerp ncells) (integerp cell) 

(<= cell ncells)) 

(booleval (is-one-of-the-characters step cell alphabet) 
(convert-tape-to-assignment tape step cell 

ncells) ) ) ) 

The satisfiability of is-not-two-characters is easy to establish in the same 
way. So now we are ready to lift the result higher in the truth assignment. 

But this is not as simple as it would appear at first. The problem is that the 
instance of convert-tape-to-assignment used in the theorem above hardcodes 
the value of cell. We need to generalize this theorem to allow other cell values, 
such as the ones in the call from convert-conf ig-move-to-assignment. This 
assignment has some values in front of, not just behind, the one we need. 

This is not straightforward. It is possible that one of the assignments in front 
gives a different value to a proposition. Even if the assignments are disjoint; i.e., 
if they assign values to different propositions, it is possible for the combination 
to provide unexpected results. The reason for this is that booleval implicitly 
assigns a value of false to any proposition that is not explicitly assigned, which 
is a valuable property of booleval because it allows truth assignments to be 
built incrementally. Compatibility of truth assignments depends not only on the 
assignments themselves, but on the variables used in the term being evaluated. 
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To continue the proof, therefore, we have to consider the propositions that 
are assigned values by a truth assignment, as well as the propositions used in 
an expression. We defined the functions assigned-vars and vars-in-term for 
this purpose. Typical theorems about these include the following: 



(defthm vars-in-term-one-of-the-characters 
(implies (member prop (vars-in-term 

(is-one-of-the-characters step cell 

alphabet) ) ) 



(and (equal (prop-step prop) step) 

(equal (prop-cell prop) cell)))) 



(defthm assigned-vars-convert-tape-to-assignment 
(implies (member prop (assigned-vars 

(convert-tape-to-assignment 
tape step cell ncells))) 

(and (equal (prop-step prop) step) 

(<= cell (prop-cell prop)) 

(<= (prop-cell prop) ncells)))) 

Now it is possible to lift the theorem to bigger truth assignments. We only need 
lemmas specifying how booleval composes the truth assignment. The following 
lemma is the one we need for this specific case: 



(defthm booleval-append-alist-left 

(implies (and (not (intersectp-equal (vars-in-term x) 

(assigned-vars a))) 



(alistp a)) 

(equal (booleval x (append a b)) 
(booleval x b)))) 



This suffices to lift the theorem so that we know the truth assignment generated 
by convert-conf ig-move-to-assignment satisfies is-a-string: 

(defthm move-to-assignment-is-a-string 
(implies (and (no-duplicates alphabet) 

(subsetp (config-lhs config) alphabet) 

(subsetp (config-rhs config) alphabet) 

(member (symb (first (config-rhs config)) 

(conf ig-state config) 
move) 
alphabet) 

(member nil alphabet) 

(not (zp ncells))) 

(booleval (is-a-string step 1 ncells alphabet) 
(convert-conf ig-move-to-assignment 
config move step ncells)))) 
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Notice the requirements that the move appear in the alphabet, and that the tape 
is a subset of the alphabet. This is needed because is-a-string tests not only 
that the truth assignment is a consistent representation of a string, but also that 
the string is over a particular alphabet. 

To complete the satisfiability of is-a-string, we need to apply the theorem 
move-to-assignment-is-a-string to all the configurations in the computa- 
tion. In particular, we need to show that the hypothesis of this lemma will be 
satisfied by all the configurations in the computation. But this follows when the 
initial tape uses symbols only from the alphabet, since subsequent tapes will 
also satisfy the requirement as long as the transitions in the machine are valid. 
So we have completed the proof of the satisfiability of is-a-2d-array. 

The proof of the other three major subexpressions follows the same pattern. 
The proofs of f irst-string-is-input and last-string-accepts do not bring 
anything new to the table, so we will omit them. It is only worth noting that the 
proof of last-string-accepts depends on the fact that the last configuration 
has a composite symbol. In particular, the left tape is not allowed to grow by 
more than the number of steps, which is straightforward to show. 

That leaves the proof that the assignment satisfies valid-computation. Our 
plan is to split the tape into three parts. In the middle are the cells around the 
read/write head, which could possibly be affected by a move. The remaining 
cells are considered to be either to the left or to the right. 

So our first task is to see what happens to a character that is (far enough) to 
the left of the head. Consider what happens to the actual tape. Suppose conf igl 
and config2 are valid configurations. We explore every possible way in which 
config2 can follow conf igl. The following theorem is representative: 

(defthm cdr-lhs-tape-does-not-change-left-move-possible 
(implies (and (equal (move-direction move) ’left) 

(consp (config-lhs conf igl)) 

(valid-step machine conf igl move config2)) 

(equal (rest (config-lhs conf igl)) 

(config-lhs config2)))) 

Using this lemma, it is possible to show that if a cell has a given value and the 
cell is (far enough) to the left of the read/ write head, the cell has the same value 
at the next iteration. Since propositions explicitly in the truth assignments are 
assigned true, truth is equivalent to membership in the assignment. This results 
in the following theorem, which is representative of the various cases to consider: 

(defthm early-cell-in-convert-conf ig-left-move-possible 
(implies (and (consp (config-lhs conf igl)) 

(not (zp ncells)) 

(<= (len (config-lhs conf igl)) ncells) 

(member prop (assigned-vars 

( convert- conf ig-move-to-assignment 
conf igl move step ncells))) 

(< (prop-cell prop) (len (config-lhs configl))) 
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(equal (move-direction move) ’left) 

(valid-step machine configl move config2)) 

(member (prop (1+ step) (prop-cell prop) 

(prop-char prop)) 

(assigned-vars 

(convert-conf ig-move-to-assignment 
config2 move2 (1+ step) ncells))))) 

Naturally, the next step is to lift this result to the larger truth assignments. 

Theorems such as the one above show precisely what happens to all the cells 
in a tape, except for two cells around the read/write head, the one which the 
head is scanning and the one to which the head will move. We have to handle 
these cases separately. Although these results are more interesting, in the sense 
that this is where the machine is performing some action, they are easier to 
prove than the ones above because we know precisely which cells are involved. 
That means we can prove an exact theorem, such as the following: 

(defthm middle-cell-in-convert-conf ig-move-left-move-possible-1 
(implies (and (consp computation) 

(consp (rest computation)) 

(equal (first (first computation)) config2) 

(equal (rest (first computation)) move) 

(equal (first (first (rest computation))) configl) 
(consp (config-lhs configl)) 

(not (zp ncells)) 

(<= (len (config-lhs configl)) ncells) 

(equal step (len computation)) 

(equal (move-direction move) ’left) 

(valid- computation machine computation)) 

(and (member (prop (1- step) 

(len (config-lhs configl)) 

(first (config-lhs configl))) 
(assigned-vars 

(convert-conf ig-move-to-assignment 
configl move (1- step) ncells))) 

(member (prop step 

(len (config-lhs configl)) 

(symb (first (config-lhs configl)) 
(conf ig-state config2) 
prevmove) ) 

(assigned-vars 

(convert-conf ig-move-to-assignment 
config2 prevmove step ncells)))))) 

This theorem covers the cell position to which the head moves. A similar theorem 
takes care of the cell position originally containing the read/write head. 
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These lemmas are almost ready to be stitched together into the final theorem. 
The missing piece is the fact that the “else” case in the definition of valid-cell, 
which allows a cell in the tape that is away from the head to retain its value, 
requires the cell not to have a relevant neighbor. So we must prove that when a 
cell is (far enough) away from the head, the cells around it are not composite. 

Combining all the theorems proved so far shows that valid- computation is 
satisfied by the generated truth assignment. Then combining that with the other 
parts of the condition, we get the final result: 

(defthm valid-transformation-computation-best 
(implies (and (integerp n) (< 1 n) 

(valid-machine machine) 

(alphabet-symbol-list-p (ndtm-alphabet machine)) 
(no-duplicates (ndtm2sat-alphabet machine)) 
(ndtm-accepts-p machine input (1- n) ) 

(subsetp input (ndtm-alphabet machine))) 

(booleval (ndtm2sat machine input n n) 

(convert-computation-to-assignment 

machine 

(accepting-witness machine input (1- n)) 
n)))) 

Note: The function accepting-witness searches for a valid, accepting compu- 
tation. 



3.5 Case II: The Expression Is Satisfiable 

In this section we explore the other half of the proof. We wish to show that 
when the expression is satisfiable, the input x is accepted by the machine. 
To do this, we will extract a computation from a truth assignment that satisfies 
Ex by looking at all the propositional formulas Cij^x for each i and j, and 
selecting the one X that makes it true. So the most fundamental function is 
extract-char-alist, which finds the X that makes Cij^x true: 

(defun extract-char-alist (step cell alphabet alist) 

(if (endp alphabet) 
nil 

(if (booleval (prop step cell (first alphabet)) alist) 

(first alphabet) 

(extract-char-alist step cell (rest alphabet) alist)))) 

Notice that we must know the relevant alphabet a priori. 

With this function we can define extract-lhs-tape and extract-rhs-tape, 
which extract the left and right halves of the tape, respectively. It is only neces- 
sary to know where to split the tape. We do this by iterating over all the cells in 
the tape until we find a composite cell. We wrote different functions for the left 
and right halves of the tape since the former is stored in reversed order. Once 
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these functions are defined, it is a simple matter to write extract-conf ig, 
which extracts a configuration, and use that to define extract-computation 
which extracts the candidate valid, accepting computation. 

At this point, we have a situation similar to the one we faced in trying to 
prove Ex is satisfiable if there is a valid computation. It would appear that the 
remaining part of the proof is as difficult as what has gone before, since in both 
cases we are considering an expression of the form (booleval expr alist). 
But there is a key difference. Previously, we had a valid computation and we 
used that to extract an alist. The extraction process was localized, so it was 
necessary to dig into portions of the alist to find the part that made a particular 
expression true. But in this case, the alist is known a priori, so we need only 
split expr into its subexpressions, leaving the alist unchanged. 

The key point is that we break up the structure of expr, not of alist. Since 
the function booleval is defined precisely in this way, this leads to much simpler 
lemmas, without worrying about issues such as inconsistent truth assignments. 
This came as a very pleasant discovery for us. We noticed that the proof in 
this direction was much easier partly because so much more of the proof was 
discovered automatically by ACL2. It was in trying to understand why we were 
so lucky that we discovered the delicious asymmetry of booleval. Since the 
proof is much more mechanical in this direction, we will only present an outline. 

Notice that the function extract-computation is guaranteed to extract only 
one computation. However, it is possible that more computations can be ex- 
tracted from the truth assignment. Of course this is not the case, and at first we 
thought there was no real need to prove this, but it turns out that this unique- 
ness property is crucial in the other proofs. Many times it will not be enough to 
know that Cij^x is true; we must also know that Cij^y is false for all Y ^ X. 

As before, the strategy is to isolate what happens around the read/ write 
head. This corresponds to the composite symbol, so it is necessary to know that 
there is only one composite symbol at any step in the truth assignment. We do 
this by counting the number of composite symbols in a given step. If we know 
that this number is equal to 1 and we find a composite symbol at some cell, then 
we are guaranteed that none of the symbols in other cells are composite. 

Next we show that if a cell changes from one step to the next, then one 
of its neighbors must be a composite symbol. Moreover, the composite symbol 
is restricted based on its relationship to the cell that changed. For example, if 
T{i,j) ^ T(z— 1, j) and T(z— l,j — 1) is acomposite, then it must be acomposite 
symbol corresponding to a right move of the tape. 

To complete the proof we observe that every step in the computation has 
exactly one composite symbol. It is easy to show that if one configuration has 
only one composite symbol, the next one can have at most one such symbol, 
and if a configuration has no composite symbols neither does the next. Since the 
initial and final configurations have one composite symbol, so must all the other 
ones in the computation, and this is the one found when we split the left and 
right tapes in the transformation. 
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Essentially the proof is now complete. The cells that are sufficiently to the 
left of the composite symbol in a step are unchanged, as are the ones that 
are sufficiently to the right. The behavior of the cells immediately around the 
composite symbol is also known. This is enough to show that two successive 
configurations extracted from the truth assignment legally follow according to 
the rules of the Turing machine. 

The only remaining complication is that the computation that is extracted 
is not the computation that ndtm-comp-step-n will enumerate. But the only 
difference is that the extracted computations pads the right tape with blanks to 
make p{n) cells. It is easy to show that such starting configurations are equiv- 
alent, in the sense that if one of them ends in an accepting state so does the 
other. 



3.6 Timing Analysis 

Thus far we have ignored the issue of timing. But it is an important aspect of 
the Cook-Levin theorem that the transformation take only polynomial time, so 
we would like to address this as well. 

Unlike higher-order theorem provers, ACL2 does not provide any introspec- 
tion mechanisms that can be used for cost measurement. It does provide a mech- 
anism for defining an interpreter over certain functions, but this interpreter is 
unsuitable for measuring costs, since it uses the functions directly to evaluate 
results without opening up their definitions. 

Curiously, ACL2’s prececessor, the Boyer-Moore theorem prover, did have 
a facility that would be useful in this context. In that theorem prover, every 
function definition extended a set of built-in interpreters, including v&c$ which 
computed the value and the cost of an expression. 

Without such an interpreter, however, we are forced to proceed differently. 
What we did was to define a cost-* version of each function used in the transla- 
tion. This function returns a pair, the first element being the normal value of the 
function, and the second a measure of the cost used to compute this value. For 
each such function, we also proved two theorems about it. The first states that 
the cost-* function accurately computes the function it emulates. The second 
gives an upper bound for the cost. 



4 Conclusions 

In this paper we described a formal proof in ACL2 of the Cook-Levin theorem. 
The formal proof fills in the gaps typically left by higher-level proofs. In partic- 
ular, we showed that the transformation mapping instances of Turing machines 
to satisfiability really does work. 

We attempted to use this proof while teaching a one-hour graduate course 
introducing students to ACL2. The format of the course requires each student 
to make a presentation during at least one class period. One of the challenges 
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in teaching a course like this is keeping students interested in each other’s pre- 
sentations. By having each student make a small contribution to a larger re- 
search project, we hoped to establish a continuity between the presentations 
that would involve them throughout the semester. Unfortunately this did not 
work as planned. The students did gain experience with ACL2, and some of them 
are becaming proficient in it, but they found the Cook-Levin theorem too diffi- 
cult to formalize. Not having had a course that covered this theorem in detail, 
many found even the informal proof too difficult to follow. 

This is a shame because the proof does follow many classic patterns of formal 
proofs: It builds a formal model of the entities involved, namely Turing machines 
and boolean expressions; it constructs mappings between them; and it shows that 
the mappings are connected in important ways. Moreover, in doing the proof we 
discovered that the asymmetry in the definition of booleval led to one half of the 
proof being much easier than the other. This is a beautiful example of the deep 
connection between recursion and induction. One direction is easier because its 
natural induction scheme mirrors the recursive structure of the function, making 
everything work smoothly. 

The major weakness in the formalization lies in the analysis of the time 
complexity of the translation. It is an important aspect of the proof that the 
translation can be performed in polynomial time. But this is not the sort of 
reasoning that comes naturally in ACL2. Currently we are investigating a way 
to extend ACL2 to introduce interpreters that can compute the cost of evaluating 
an expression as well as its value. There are some very interesting challenges, 
such as the termination proof for the interpreter. 
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Abstract. The Kepler Conjecture states that the densest packing of 
spheres in three dimensions is the familiar cannonball arrangement. Al- 
though this statement has been regarded as obvious by chemists, a rig- 
orous mathematical proof of this fact was not obtained until 1998. 

The mathematical proof of the Kepler Conjecture runs 300 pages, and 
relies on extensive computer calculations. The refereeing process involved 
more than 12 referees over a five year period. This talk will describe the 
top-level structure of the proof of this theorem. The proof involves meth- 
ods of linear and non-linear optimization, and arguments from graph 
theory and discrete geometry. In view of the complexity of the proof and 
the difficulties that were encountered in refereeing the proof, it seems de- 
sirable to have a formal proof of this theorem. This talk will give details 
about what would be involved in giving a formal proof of this result. 
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Abstract. In this paper, we introduce a Foundational Proof-Carrying Code 
(FPCC) framework for constructing certified code packages from typed assembly 
language that will interface with a similarly certified runtime system. Our frame- 
work permits the typed assembly language to have a “foreign function” interface, 
in which stubs, initially provided when the program is being written, are eventu- 
ally compiled and linked to code that may have been written in a language with 
a different type system, or even certified directly in the FPCC logic using a proof 
assistant. We have increased the potential scalability and flexibility of our FPCC 
system by providing a way to integrate programs compiled from different source 
type systems. In the process, we are explicitly manipulating the interface between 
Hoare logic and a syntactic type system. 



1 Introduction 

Proof-Carrying Code (PCC) [16, 17] is a framework for generating executable machine 
code along with a machine-checkable proof that the code satisfies a given safety pol- 
icy. The initial PCC systems specified the safety policy using a logic extended with 
many (source) language-specific rules. While allowing implementation of a scalable 
system [18, 7], this approach to PCC suffers from too large of a trusted computing base 
(TCB). It is still difficult to trust that the components of this system - the verihcation- 
condition generator, the proof-checker, and even the logical axioms and typing rules - 
are free from error. 

The development of another family of PCC implementations, known as Founda- 
tional Proof-Carrying Code (FPCC) [4,3], was intended to reduce the TCB to a min- 
imum by expressing and proving safety using only a foundational mathematical logic 
without additional language-specific axioms or typing rules. The trusted components in 
such a system are mostly reduced to a much simpler logic and the proof-checker for it. 

Both these approaches to PCC have one feature in common, which is that they 
have focused on a single source language (e.g. Java or ML) and compile (type-correct) 
programs from that language into machine code with a safety proof. However, the run- 
time systems of these frameworks still include components that are not addressed in 

* This research is based on work supported in part by DARPA OASIS grant F30602-99-1-0519, 
NSF grant CCR-9901011, NSF ITR grant CCR-0081590, and NSF grant CCR-0208618. Any 
opinions, findings, and conclusions contained in this document are those of the authors and do 
not reflect the views of these agencies. 
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the safety proof [3, 10] and that are written in a lower-level language (like C): mem- 
ory management libraries, garbage collection, debuggers, marshallers, etc. The issue of 
producing a safety proof for code that is compiled and linked together from two or more 
different source languages was not addressed. 

In this paper, we introduce an FPCC framework for constructing certihed machine 
code packages from typed assembly language (TAL) that will interface with a similarly 
certihed runtime system. Our framework permits the typed assembly language to have 
a “foreign function” interface in which stubs, initially provided when the program is 
being written, are eventually compiled and linked to code that may have been writ- 
ten in a language with a different type system, or even certihed directly in the FPCC 
logic using a proof assistant. To our knowledge, this is the hrst account of combining 
such certihcation proofs from languages at different levels of abstraction. While type 
systems such as TAL facilitate reasoning about many programs, they are not sufficient 
for certifying the most low-level system libraries. Hoare logic-style reasoning, on the 
other hand, can handle low-level details very well but cannot account for embedded 
code pointers in data structures, a feature common to higher-order and object-oriented 
programming. We outline for the hrst time a way to allow both methods of verihcation 
to interact, gaining the advantages of both and circumventing their shortcomings. 

Experience has shown that foundational proofs are much harder to construct than 
those in a logic extended with type-specihc axioms. The earliest FPCC systems built 
proofs by constructing sophisticated semantic models of types in order to reason about 
safety at the machine level. That is, the hnal safety proof incorporated no concept of 
source level types - each type in the source language would be interpreted as a predicate 
on the machine state and the typing rules of the language would turn into lemmas which 
must prove properties about the interaction of these predicates. While it seems that 
this method of FPCC would already be amenable to achieving the goals outlined in 
the previous paragraph, the situation is complicated by the complexity of the semantic 
models [11,5, 1] that were required to support a realistic type system. Nonetheless, the 
overall framework of this paper may work equally well with the semantic approach. 

In this paper, we adopt the “syntactic” approach to FPCC, introduced in [14, 13] and 
further applied to a more realistic source type system by [9, 10]. In this framework, the 
machine level proofs do indeed incorporate and use the syntactic encoding of elements 
of the source type system to derive safety. Previous presentations of the syntactic ap- 
proach involve a monolithic translation from type-correct source programs to a package 
of certihed machine code. In this paper, we rehne the approach by inserting a generic 
layer of reasoning above the machine code which can (7 j be a target for the compilation 
of typed assembly languages, (2) certify low-level runtime system components using 
assertions as in Hoare logic, and (3) “glue” together these pieces by reasoning about the 
compatibility of the interfaces specihed by the various types of source code. 

A simple diagram of our framework is given in Figure 1 . Source programs are writ- 
ten in a typed high-level language and then passed through a certifying compiler to 
produce machine code along with a proof of safety. The source level type system may 
provide a set of functionality that is accessed through a library interface. At the machine 
level, there is an actual library code implementation that should satisfy that interface. 
The non-trivial problem is how to design the framework such that not only will the two 
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Fig. 1. FPCC certified runtime framework. 



pieces of machine code link together to run, but that the safety proofs originating from 
two different sources are also able to “link” together, consistent with the high-level 
interface specification, to produce a unified safety proof for the entire set of code. 

Notice that the interaction between program and library is two-way: either piece of 
code may make direct or indirect function calls and returns to the other. Ideally, we want 
to be able to certify the library code with no knowledge of the source language and type 
system that will be interacting with it. At the same time we would like to support first- 
class code pointers at all levels of the code. Methods for handling code pointers properly 
have been one of the main challenges of FPCC and are one of the differentiating factors 
between semantic and syntactic FPCC approaches. For the framework in this paper, we 
have factored out most of the code pointer reasoning that is needed when certifying 
library code so that the proofs thereof can be relatively straightforward. 

In the following sections, after defining our machine and logic, we present the layer 
of reasoning which will serve as the common interface for code compiled from different 
sources. Then we present a typical typed assembly language, extended with library 
interfaces and external call facilities. We finally show how to compile this language to 
the target machine, expanding external function stubs, and linking in the runtime library, 
at the same time producing the proof of safety of the complete package. We conclude 
with a brief discussion of implementation in the Coq proof assistant and future and 
related work. 

2 A Machine and Logic for Certified Code 

In this section, we present our machine on which programs will run and the logic that 
we use to reason about safety of the code being run. We use an idealized machine for 
purposes of presentation in this paper although implementation upon the IA-32 (Intel 
x86 architecture) is in progress. A “real” machine introduces many engineering details 
(e.g. fixed-size integers, addressing modes, memory model, variable length instructions 
and relative addressing) which we would rather avoid while presenting our central con- 
tributions along the subject of this paper. 

2.1 The Machine 

The hardware components of our idealized machine are a memory, register file, and a 
special register containing the current program counter (pc). These are defined to be 
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Word 3 w,i,pc ::= 0 \ 1 \ . . . 

Regt 3 r ;:= rO | rl | . . . | rl5 

Cmd B c ;:= add rd, rs,rt \ addi rd, ra,i \ mov Vd, rs \ movi rd, i 

I bgt rs,rt,w \ bgti Vs,i,w \ ldrd,rs{i) \ st rd{i),rs 
I j d ui I jmp r I illegal 

M G Mem — Word Word 
R G RFile — Regt Word 
S G State — Mem x RFile x Word 

Fig. 2. Machine state: memory, registers, and instructions (commands). 



ifDc(M(pc)) = 


then Sfep(M, R, pc) = 


add rd,rs,rt 


(M, R{r<j !->■ R(rs) + R(rt)},pc-|-1) 


addi rd,Ts,i 


(M, R{rd i-B R(rs) -1- i}, pc -1-1) 


mov rd,Vs 


(M, R{vd !->■ rs},pc-l-l) 


movi rd, i 


(M, R{r<i !->■ i},pc+\) 


16 .rd,ra{i) 


(M, R{rd i-B M(R(rs) + i)}, pc -1-1) 


st rd{i),rs 


(M{R(rd) + i i-B R(rs)},R, pc -1-1) 


bgt rs,rt,w 


(M, R, pc -fl) when R(cs) < R(rt) and (M, R, tc) when R(rs) > R(rt) 


bgti rs,^^ w 


(M, R,pc-Fl) whenR(rs) < i and (M, R, w) when R(rs) > f 


jdui 


(M, R, w) 


jmp r 


(M, R, R(r)) 


illegal 


(M, R, pc) 



Fig. 3. Machine semantics. 



the machine state, as shown in Figure 2. We use a 16-register word-addressed machine 
with an unbounded memory of unlimited- size words. We also define a decoding func- 
tion Dc which decodes integer words into a structured representation of instructions 
(“commands”), also shown in Figure 2. The machine is thus equipped with a Step func- 
tion that describes the (deterministic) transition from one machine state to the next, 
depending on the instruction at the current pc. 

The operational semantics of the machine is given in Figure 3. The instructions’ ef- 
fects are quite intuitive. The first half involve arithmetic and data movement in registers. 
The Id and st load and store data from/to memory. These are followed by the condi- 
tional and unconditional branch instructions. An i 1 legal (non-decodable) instruction 
puts the machine in an infinite loop. 

2.2 The Logic 

In order to produce FPCC packages, we need a logic in which we can express (encode) 
the operational semantics of the machine as well as define the concept and criteria of 
safety. A code producer must then provide a code executable (initial machine state) 
along with a proof that the initial state and all future transitions therefrom satisfy the 
safety condition. 
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The foundational logic we use is the calculus of inductive constructions (CiC) [24, 
20]. CiC is an extension of the calculus of constructions (CC) [8], which is a higher- 
order typed lambda calculus. Due to limited space we forgo a discussion of CiC here 
and refer the reader unfamiliar with the system to the cited references. 

CiC has been shown to be strongly normalizing [25], hence the corresponding logic 
is consistent. It is supported by the Coq proof assistant [24], which we use to implement 
a prototype system of the results presented in this paper. 

2.3 Defining Safety and Generating Proofs 

The safety condition is a predicate expressing the fact that code will not “go wrong.” 
We say that a machine state S is safe if every state it can ever reach satisfies the safety 
policy SP: 

Safe (S, SP) = nn : Nat. SP (Step" (S)) 

A typical safety policy may require such things as the program counter must point 
to a valid instruction address in the code area and that any writes (reads) to (from) 
memory must be from a properly accessible area of the data space. For the purposes of 
presentation in this paper, we will be using a very simple safety policy, requiring only 
that the machine is always at a valid instruction: 

BasicSP (M, K, pc) = Dc (M(pc)) 7 ^ illegal A lnCodeArea(M, pc) 

We can easily define access controls on memory reads and writes by including an- 
other predicate in the safety policy, SafeRdWr(M, K,pc). By reasoning over the num- 
ber of steps of computation more complex safety policies including temporal constraints 
can potentially be expressed. However, we will not be dealing with such policies here. 

The FPCC code producer has to provide an encoding^ of the initial state So along 
with a proof A that this state satisfies the safety condition BasicSP, specified by the 
code consumer. The final FPCC package is thus a pair: 

F = (So : State, A : Safe (So, BasicSP)). 



3 A Language for Certified Machine Code (CAP) 

We know now what type of proof we are looking for; the hard part is to generate that 
proof of safety. Previous approaches for FPCC [4, 2, 5, 14] have achieved this by con- 
structing an induction hypothesis, also known as the global invariant, which can be 
proven (e.g. by induction) to hold for all states reachable from the initial state and is 
strong enough to imply the safety condition. The nature of the invariant has ranged 
from a semantic model of types at the machine level (Appel et al. [4, 2, 5, 23]) to a 
purely syntactic well-formedness property [14, 13] based on a type-correct source pro- 
gram in a typed assembly language. 

' We must tmst that our encoding of the machine and its operational semantics, and the definition 
of safety, are correct. Along with the logic itself and the proof-checker implementation thereof, 
these make up most of our software trusted computing base (TCB). 
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What we have developed in this paper refines these previous approaches. We will 
still be presenting a typed assembly language in Section 4, in which most source pro- 
grams are written. However, we introduce another layer between the source type system 
and the “raw” encoding of the target machine in the FPCC logic. This is a “type system” 
or “specification system” that is defined upon the machine encoding, allowing us to rea- 
son about its state using assertions that essentially capture Hoare logic-style reasoning. 
Such a layer allows more generality for reasoning than a fixed type system, yet at the 
same time is more structured than reasoning directly in the logic about the machine 
encoding. 

Our language is called CAP and it uses the same machine syntax as presented in 
Figure 2. The syntax of the additional assertion layer is given below: 

P,Q, R € Pred = State — > Prop 

T* G CdSpec = Word {Word x Pred) 

CmdList 9 C ::= 0 | c :: C 
WordList 9 W ::= 0 | w :: W 

The name CAP is derived from its being a “Certified Assembly Programming” lan- 
guage. An initial version was introduced in [27] and used to certify a dynamic storage 
allocation library. The version we have used for this paper introduces some minor im- 
provements such as a unified data and code memory, assertions on the whole machine 
state, and support for user-specifiable safety policies (Section 3.3). 

Assertions (P,Q,R) are predicates on the machine state and the code specification 
(<h) is a partial function mapping memory addresses to a pair of an integer and a pred- 
icate. The integer gives the length of the command sequence at that address and the 
predicate is the precondition for the block of code. (The function of this is to allow us 
to specify the addresses of valid code areas of memory based on <!).) 

The operational semantics of the language has already been presented in Section 2. 1 . 
We now introduce CAP inference rules followed by some important safety theorems. 

3.1 Inference Rules 

CAP adds a layer of inference rules (“typing rules”) allowing us to prove specification 
judgments of the forms: 

<1> h {P} C well-formed command sequence 
h M : <I> well-formed code specification 
h (M,M,pc) well-formed machine state 

The inference rules for these judgments are shown in Figure 4. The rules for well- 
formed command sequences essentially require that if the given precondition P is satis- 
fied in the current state, there must be some postcondition Q, which is the precondition 
of the remaining sequence of commands, that holds on the state after executing one 
step. The rules directly refer to the Step function of the machine; control flow instruc- 
tions additionally use the code specification environment T* in order to allow for the 
certification of mutually dependent code blocks. 
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c E {add, addi,mov, movi, Id} 

VS.(P(S) A curcmd(S) = c) -s- Q(Step(S))) 

$ h {F} c :: C 



$ h (Q}C 

(CAP-PURE) 



VS.(P(S) A curcmd(S) = strd(»)> ’"«) 
4> H {Q}C 



Q(Step(S)) 

A -.lnCodeArea($, S .R(rd)-i-i)) 



I- {P} St rd(i), r, :: C 



(CAP-ST) 



VS.(P{S) A curcmd(S) =bgt Tg , rt , 

(S.R(r,) < S.R(r,) Q(Step(S))) A (S.R(r,) > S.R(r,) -P Q' (Step(S ))) 
4* t- {Q} C where ^{w) = (n,Q') 

4> h {P} bgt Vs,rtyW C 

VS.(P(S) A curcmd(S) = jdtu) — > Q'(Step(§)) where ^{w) = (n,Q') 



(CAP-BGT) 



$ h {P} jd tu :: 0 

VS.(P(S) A curcmd(s) = jmpr-) -5. Q'(Step(S))) n>fere4>(S.R(r)) = {n,Q') 



4> 1- {P} jmp r :: 0 



(CAP-JD) 



(CAP-JMP) 



Flatten(W, M, /) <I> h {P} (Map(Dc, W)) 

for all f where 4>(/) = (length)^, P) 

— — — (CAP-CDSPEC) 



hM:$ <!> h {P}(Map(Dc,W)) 
Flatten(W, M, pc) lnCodeArea(<Ji, pc) 

h {M, R ,pc) 



P(M, R ,pc) 

(CAP-STATE) 



Fig. 4. CAP inference rules. 



We group as “pure” commands all those which do not involve control flow and do 
not change the memory {i.e. everything other than branches, jumps, and st). The st 
command requires an additional proof that the address being stored to is not in the code 
area (i.e. we do not permit self-modifying code). curcmd(S) is defined as: 

curcmd(M, K,pc) = Dc(M(pc)) 

The I nCodeArea predicate in the rules uses the code addresses and sequence lengths 
in fl) to determine whether a given address lies within the code area. The (CAP-CDSPEC) 
rule ensures that the addresses and sequence lengths specified in fl) are consistent with 
the code actually in memory. 

The Flatten predicate is defined as: 

Flatten(0, M, /) = True 

Flatten(u; :: W, M,/) = M(/) = w A Flatten(W, M, /-|-1) 

3.2 Safety Properties 

The machine will execute continuously, even if an illegal instruction is encountered. 
Given a well-formed CAP state, however, we can prove that it satisfies our basic safety 
policy, and that executing the machine one step will result again in a good CAP state. 

Theorem 1 (Safety Policy and Preservation). 

For some state S, if \~ S then (1 ) BasicSP(S) and (2) h Step"(S) /or a// n. 
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For the purposes of FPCC, we are interested in obtaining safety proofs in the context 
of our policy as described in Section 2.3. From Theorem 1 we can easily derive: 

Theorem 2 (CAP Safety). For any S, if \~S then Safe(S, BasicSP). 

Thus, to produce an FPCC package we just need to prove that the initial machine 
state is well-formed with respect to the CAP inference rules. This provides a structured 
method for constructing FPCC packages in our logic. However, programming and rea- 
soning in CAP is still much too low-level for the practical programmer. We thus need to 
provide a method for compiling programs from a higher-level language and type system 
to CAP. The main purpose of programming directly in CAP will then be to “glue” code 
together from different source languages and to certify particularly low-level libraries 
such as memory management. In the next few sections, we present a “conventional” 
typed assembly language and show how to compile it to CAP. 

3.3 Advanced Safety Policies 

In the theorems above, and for the rest of this paper, we are only interested in proving 
safety according to our basic safety policy. For handling more general safety policies 
using CAP, we can extend our CAP inference rules by parameterizing them with a 
“global safety predicate” SP: hgp {P} C, h-gp M : <F, and hgp (M,K,pc). 

The inference rule for each command in this extended system requires an addi- 
tional premise that the precondition for the command implies the global safety predi- 
cate. Then, using a generalized version of Theorem 1, we can establish that: 

Theorem 3 . For any S and SP, if\~sp § then Safe(S, AS' -.State. SP(S') A BasicSP(S')). 

Threading an arbitrary SP through the typing rules is a novel feature not found 
in the initial version of CAP [27]. In that case, there was no way to specify that an 
arbitrary safety policy beyond BasicSP (which essentially provides type safety) must 
hold at every step of execution. 

4 Extensible Typed Assembly Language with Runtime System 

In this section, we introduce an extensible typed assembly language (XTAL) based on 
that of Morrisett et al. [15]. After presenting the full syntax of XTAL, we give here 
only a brief overview of its static and dynamic semantics, due to space constraints 
of this paper. A more complete definition of the language can be found in the Coq 
implementation itself or the technical report [12]. 

4.1 Syntax 

To simplify the presentation, we will use a much scaled down version of typed assem- 
bly language (see Figure 5)-its types involve only integers, pairs, and integer arrays. 
(We have extended our prototype implementation to include existential, recursive, and 
polymorphic code types.) The code type V[F] describes a code pointer that expects a 
register file satisfying F. The register file type assigns types to the word values in each 
register and the heap type keeps track of the heap values in the data heap. We have 
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(type) 


T 


:= inf array 


^0 X T 1 v[r] 


(regfile type) 


r 


o 

!1 


, r„ : r„ } 


(heap type) 




: {io ~ . 




(label) 


i 


:=0|1|... 




(register) 


r 


:= rO 1 rl 1 . . . 


1 r7 


(word val) 


V 


■~l\i 




(code heap val) 


h 


:= code [r].J 


1 stub [T].0 


(heap val) 


h 


•= [fOi • • • 5 tn 


1 (wo,-yi) 


(instr) 


L 


:= add Td,ra, 


rt 1 movl rd,i 






1 St rd{i),rs 


1 bgt r„,rt,Z 1 


(instr seq) 


I 


:= t; / jd i jmp r 


(code heap) 


C 


:= {Zo ho, 


. . , /n 1— > hn} 


(data heap) 


H 


:= {Zo 1 -^ ho, 


. . , /n 1 — ^ hn} 


(regfile) 


R 


■- {ro ^ Vo, 


. . . 1-^ Vn} 


( program ) 


V 


:= (C,H,R,I) 



movl Td,l I Id rd,rs{i) 



Fig. 5. XTAL syntax. 



separated the code and data heaps at this level of abstraction because the code heap will 
remain the same throughout the execution of a program. 

Unlike many conventional TALs, our language supports “stub values” in its code 
heap. These are placeholders for code that will be linked in later from another source 
(outside the XTAL system). Primitive “macro” instructions that might be built into other 
TALs, such as array creation and access operations, can be provided as an external 
library with interface specified as XTAL types. We have also included a typical macro 
instruction for allocating pairs (newpair) in the language. When polymorphic types are 
added to the language, this macro instruction could potentially be provided through the 
external code interface; however, in general, providing built-in primitives can allow for 
a richer specification of the interface (see the typing rule for newpair below). 

The abstract state of an XTAL program is composed of code and data heaps, a reg- 
ister file, and current instruction sequence. Labels are simply integers and the domains 
of the code and data heaps are to be disjoint. Besides the newpair operation, the arith- 
metic, memory access, and control flow insfrucfions of XTAL correspond directly to 
those of the machine defined in 2.1. The movl instruction is constrained to refer only 
to code heap labels. Note that programs are written in continuation passing style; thus 
every code block ends with some form of jump to another location in the code heap. 

4.2 Static and Dynamic Semantics 

The dynamic (operational) semantics of the XTAL abstract machine is defined by a sef 
of rules of fhe form V ^ V . This evaluation relation is entirely standard (see [15, 13]) 
except that the case when jumping to a stub value in the code heap is not handled. The 
complete rules are omitted here. 
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Judgment 


Meaning 


F To C Ti 
F {C,H,R,I) 

\-c 

\-H:'S> 

C-,<b R-.r 
C\-h cdval 
'F F /i : T hval 
T'; F v:t 
C;T hi 


To is a register file subtype of Fi 
(C,H,R,I) is a well-formed program 
C is a well-formed code heap 
JT is a well-formed data heap of type tF 
J? is a well-formed reg. file of type F 
h is a well-formed code heap value 
fi is a well-formed data heap value of type t 
t) is a well-formed word value of type t 
/ is a well-formed instruction sequence 



Fig. 6. Static judgments. 



For the static semantics, we define a set of judgments as illustrated in Figure 6. Only 
a few of the critical XTAL typing rules are presented here. The top-level typing rule for 
XTAL programs requires well-formedness of the code and data heaps, register file, and 
current instruction sequence, and that I is somewhere in the code heap: 



hC h i? : F C;F h/ 

3^ G Dom{C). C{1) = code [F']./' and I /' 

^ (C,H,R,I) 



(PROG) 



Heap and register file typing depends on the well-formedness of the elements in 
each. Stub values are simply assumed to have the specified code type. From the in- 
struction typing rules, we show below the rules for newpair, jd, and jmp. The newpair 
instruction expects initialization values for the newly allocated space in registers rO and 
rl and a pointer to the new pair is put in rd- 



C;F h/ 

— (CODE) 

C Fcode [T].I cdval 



Chstub[F].0cdval 



r(rO) = To r(rl) = T^ C;T{rd:Tg X T^}\-I 
C;T hnewpairrd[To,Tj;/ 



(IS-NEWPAIR) 



typeof(C(0) = V[F'] h F C F' 
C;F hjd I 



(IS-JD) 



r(r)=V[F'] FFcr' 
C; F Fjmp r 



(IS-JMP) 



Although the details of the type system are certainly important, the key thing to 
be understood here is just that we are able to encode the syntactic judgment forms of 
XTAL in our logic and prove soundness in Wright-Felleisen style [26]. We will then 
refer to these judgments in CAP assertions during the process of proving machine code 
safety. 



4.3 External Code Stub Interfaces 

XTAL can pass around pointers to arrays in its data heap but has no built-in operations 
for allocating, accessing, or modifying arrays. We provide these through code stubs: 

newarray stub [{rO:int, rl:int, r7: (V[{rO: array}]) }].0 
arrayget stub ]{rO:array, rl:int, r7:(V[{rO:int}]) }].0 
arrayset K-> stub ]{rO:array, rl:int, r2:int, r7: (V[{rO:array}|) }].0 
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newarray expects a length and initial value as arguments, allocates and initializes a 
new array accordingly, and then jumps to the code pointer in r7. The accessor operations 
similarly expect an array and index arguments and will return to the continuation pointer 
in r7 when they have performed the operation. As is usually the case when dealing with 
external libraries, the interfaces (code types) defined above do not provide a complete 
specification of the operations (such as bounds-checking issues). Section 5.3 discusses 
how we deal with this in the context of the safety of XTAL programs and the final 
executable machine code. 

4.4 Soundness 

As usual, we need to show that our XTAL type system is sound with respect to the 
operational semantics of the abstract machine. This can be done using the standard 
progress and preservation lemmas. However, in the presence of code stubs, the complete 
semantics of a program is undefined, so at this level of abstraction we can only assume 
that those typing rules are sound. In the next section, when compiling XTAL programs 
to the real machine and linking in code for these libraries and stubs, we will need to 
prove at that point that the linked code is sound with respect to the XTAL typing rules. 
Let us define the state when the current XTAL program is jumping to external code: 

Definition 1 (External call state). Wc define the current instruction of a program, 
(C, H, R, I), to be an external call if I G {jd (, jmp r, bgt..., bgt\ ...} and C{1) = 
stub [r].0 orC{R{r)) = stub [L].0, as appropriate. 

Theorem 4 (XTAL Progress). If'r'P and the current instruction ofV is not an exter- 
nal call then there exists V' such that V V'. 

Theorem 5 (XTAL Preservation). If\- V and V ^ V' then h V' . 

These theorems are proven by induction on the well-formed instruction premise 
(C; r h /) of the top level typing rule (h V). Of course the proof of these must be done 
entirely in the FPCC logic in which the XTAL language is encoded. 

In our previous work [14, 13], we demonstrated how to get from these proofs of 
soundness directly to the FPCC safety proof. However, now we have an extra level to 
go through (the CAP system) in which we will also be linking external code to XTAL 
programs, and we must ensure safety of the complete package at the end. 

5 Compilation and Linking 

In this section we first define how abstract XTAL programs will be translated to, and laid 
out in, the real machine state (the runtime memory layout). We also define the necessary 
library routines as CAP code (the runtime system). Then, after compiling and linking 
an XTAL program to CAP, we must show how to maintain the well-formedness of that 
CAP state so that we can apply Theorem 2 to obtain the final FPCC proof of safety. 

5.1 The Runtime System 

In our simple runtime system, memory is divided into three sections - a static data area 
(used for global constants and library data structures), a read-only code area (which 
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might be further divided into subareas for external (£) and program code), and the 
dynamic heap area, which can grow indefinitely in onr idealized machine. We nse a data 
allocation framework where a heap limit, stored in a fixed allocation pointer register^, 
designates a finite portion of the dynamic heap area as having been allocated for use. 
(Our safety policy could use this to specify “readable” and “writeable” memory.) 

5.2 Translating XTAL Programs to CAP 

We now outline how to construct (compile) an initial CAP machine state from an XTAL 
program. Given an initial XTAL program, we need the following (partial) functions or 
mappings to produce the CAP state: 

- Ac ■ label Word - a layout mapping from XTAL code heap labels to CAP 
machine addresses. 

- Ad ■ label Word - a layout mapping from XTAL data heap labels to CAP ma- 
chine addresses. Both the domain and range of the two layout functions should 
be disjoint. We use A without any subscript to indicate the union of the two: 
A = Ac U Ad- 

- E : Word CmdList x Pred - the external (from XTAL’s point of view) code 
blocks and their CAP preconditions for well-formedness. Proving that these blocks 
are well-formed according to the preconditions will be a proof obligation when 
verifying the safety of the complete CAP state. The range of Ac may overlap with 
the domain of £ - these addresses are the implementation of XTAL code stubs. 

With these elements, the translation from XTAL programs to CAP is quite straight- 
forward. As in [14], we can describe the translation by a set of relations and associated 
inference rules. Because of limited space, we only show here the top-level rule: 

A\-{C,H)^U A\-R^R Ac\-I=>C Flatten(C, M,pc) 

3LC(0 =code[r]./' A I I' A pc = Ac{l) + \I'\ - \I\ 

Vui e Dom(£). Flatten(Fst(£(w)),M, w) 

^ ^ ^ ^ (tr-prog) 

£-,A\- (C,H,R,I) ^ (M,R,pc) 

Register files and word values translate fairly directly between XTAL and the ma- 
chine. XTAL labels are translated to machine addresses using the A functions. Every 
heap value in the code and data heaps must correspond to an appropriately translated 
sequence of words in memory. All XTAL instructions translate directly to a single ma- 
chine command except newpair which translates to a series of commands that adjust 
the allocation pointer to make space for a new pair and then copy the initial values 
from rO and rl into the new space. We ignore the stubs in the XTAL code heap transla- 
tion because they are handled in the top-level translation rule shown above (when £ is 
Flatten’ed). 

5.3 Generating the CAP Proofs 

In this section we proceed in a top-down manner by first stating the main theorem we 
wish to establish. The theorem says that for a given runtime system, any well-typed 

^ XTAL source programs use fewer registers than the actual machine provides. 




130 



Nadeem Abdul Hamid and Zhong Shao 



XTAL program that compiles and links to the runtime will result in an initial machine 
state that is well-formed according to the CAP typing rules. Applying Theorem 2, we 
would then be able to produce an FPCC package certifying the safety of the initial 
machine state. 

Theorem 6 (XTAL-CAP Safety Theorem). For some specified external code envi- 
ronment £, and for all V and A, if V (in XTAL) and £]A\~V S, then \~ S (in 
CAP). 

To prove that the CAP state is well-formed (using the (CAP-STATE) rule, Fig- 
ure 4), we need a code heap specification, 4), and a top-level precondition, P, for 
the current program counter. The code specification is generated as follows: 4) = 
CpGen(£, Ac,C), where 

CpGen(£, Ac,C){w) 

_(Cp\m{Ac,C,T)ifwf:Dom{£) and3l.Ac{l) = w A C(() = (code [F]./) 
(Snd(5('u;)) ifw G Dom{£) 

That is, for external code blocks, the precondition comes directly from £, while 
for code blocks that have been compiled from XTAL, the CAP preconditions are con- 
structed by the following definition: 

Cplnv(yfc,C,r) = AS.3y4i),4',iT,i?.(hC) A (hiF:4') A (C;4'hi?:r) 

A {AC {C, H) S.M) A {AC S.K) 

For any given program, the code heap and layout (C and Ac) must be unchanged, 
therefore they are global parameters of these predicate generators. Cpinv captures the 
fact that at a particular machine state there is a well-typed XTAL memory and register 
file that syntactically corresponds to it. We only need to specify the register file type 
as an argument to Cpinv because the typing rules for the well-formed register file and 
heap will imply all the necessary restrictions on the data heap structure. One of the 
main insights of this work is the definition of Cpinv, which allows us to both establish 
a syntactic invariant on CAP machine states as well as define the interface between 
XTAL and library code at the CAP level. Cpinv is based on a similar idea as the global 
invariant defined in [14] but instead of a generic, monolithic safety proof using the 
syntactic encoding of the type system, Cpinv makes clear what the program-specific 
preconditions are for each command (instruction) and allows for easy manipulation and 
reasoning thereupon, as well as interaction with other type system-based invariants. 

Returning to the proof of Theorem 6, if we define the top-level precondition of the 
(cap-state) rule to be Cp\m{Ac,C, F), then it is trivially satisfied on the initial state 
S by the premises of the theorem. We now have to show well-formedness of the code 
at the current program counter, 4> h {P} C, and, in fact, proofs of the same judgment 
form must be provided for each of the code blocks in the heap, according to the (CAP- 
CDSPEC) rule. The correctness of the CAP code memory is shown by the theorem: 

Theorem 7 (XTAL-CAP Code Heap Safety). For a specified £, and for any XTAL 
program state {C, H, R, I), register file type F, layout functions A, and machine state 
(M, K,pc), such that C (C, H, R, I) and £; AC {C, H, R, I) (M, K,pc), if 4> = 
CpGen(£, Ac,C), then h M : 4>. 
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This depends in turn on the proof that each well-typed XTAL instruction sequence 
translated to machine commands will be well-formed in CAP under Cpinv: 

Theorem 8 (XTAL-CAP Instruction Safety). For a specified £, and for all Ac, C, 
I, r, and C (where <I> = CpGer\(£, Ac, C)), if C;T hi and Ac P / C, then 
T>h{Cplnv(yfc,C,r)}C. 

Due to space constraints, we omit details of the proof of this theorem except to 
mention that it is proved by induction on I. In cases where the current instruction di- 
rectly maps to a machine command (i.e., other than newpair), the postcondition (Q in 
the CAP rules) is generated by applying Cpinv to the updated XTAL register file type. 
We use the XTAL safety theorems (4 and 5) here to show that Q holds after one step 
of execution. In the case of the expanded commands of newpair, we must construct the 
intermediate postconditions by hand and then show that Cpinv is re-established on the 
state after the sequence of expanded commands has been completed. In the case when 
jumping to external code, we use the result of Proof Obligation 10 below. 

Finally, establishing the theorems above depends on satisfying some proof obliga- 
tions with respect to the external library code and its interfaces as specified at the XTAL 
level. First, we must show that the external library code is well-formed according to its 
supplied preconditions: 

Proof Obligation 9 (External Code Safety) For a given £, if^ = CpGen(5, Ac,C) 
for any Ac and C, then h {Snd(£(ru))} Fst(£(w)), for all w € Dom(£). 

For now, we assume that the proofs of this lemma are constructed “by hand” using 
the rules for well-formedness of CAP commands. 

Secondly, when linking the external code with a particular XTAL program, where 
certain labels of the XTAL code heap are mapped to external code addresses, we have to 
show that the typing environment that would hold at any XTAL program that is jumping 
to that label implies the actual precondition of that external code: 

Proof Obligation 10 (Interface Correctness) For a given £, Ac, and C, and for all I 

suchthatCfl) = stub \T].% and Ac(l) = w, i/'Cplnv(^C'j C, F)(S) fficn Snd(£(m))(S). 

These properties must be proved for each instantiation of the runtime system £. 
With them, the proofs of Theorems 8, 7, and, finally, 6 can be completed. 

5.4 arrayget Example 

As a concrete example of the process discussed in the foregoing subsection, let us con- 
sider arrayget. The XTAL type interface is defined in Section 4.3. An implementation 
of this function could be: 

Caget = [Id r8, r0(0); addi rl, rl, 1; bgt rl, r8, bnderr; add rO, rO, rl; id rO, r0(0); jmp r7] 

The runtime representation of an array in memory is a length field followed by the 
actual array of data. We assume that there is some exception handling routine for out-of- 
bounds accesses with a trivial precondition defined by £ (bnderr) = (Cbnderr, Q bnderr)- 
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Before describing the CAP assertions for the safety of Caget, notice that the code 
returns indirectly to an XTAL function pointer. Similarly, the arrayget address can be 
passed around in XTAL programs as a hrst-class code pointer. While the syntactic type 
system handles these code pointers quite easily using the relevant XTAL types, deal- 
ing with code pointers in a Hoare logic-based setup like CAP requires a little bit of 
machinery. 

We can thus proceed to directly define the precondition of Caget as, 

= Cplnv(,4c,C,{ rO: array, rl:int, r7:(V[{rO:int}]) }) 

for some Ac and C. Then we certify the library code in CAP by providing a derivation 
of (<I) L {Qaget} Caget) ■ We do this by applying the appropriate rules from Figure 4 
to track the changes that are made to the state with each command. When we reach 
the final jump to r7, we can then show that Cplnv(^C)C) {tO : inf}) holds, which must 
be the precondition specified for the return code pointer hy 4)(S.]R(r7)) (see the defini- 
tion of 4) in the beginning of Section 5.3). The problem with this method of certifying 
arrayget, however, is that we have explicitly included details about the source language 
type system in its preconditions. In order to make the proof more generic, while at the 
same time he able to leverage the syntactic type system for certifying code pointers, 
we follow a similar approach as in [27]: First, we dehne generic predicates for the pre- 
and postconditions, abstracting over an arbitrary external predicate, Paget- The actual re- 
quirements of the arrayget code are minimal (for example, that the memory area of the 
array is readable according to the safety policy). The post-condition predicate relates 
the state of the machine upon exiting the code block to the initial entry state: 

Pre = \Paget-\S. PageiiS) A SafeToRead(S .M , S.R(rO), S.R(rl)-i-l) 

Post = A(M,R,pc). A(M',R',pc'). M' = M A pc' = S.R(r7) 

A R'(rO) = M(R(rO)-l-R(rl)-|-l) A ... 

Now we certify the arrayget code block, quantifying over all Paget and complete 
code specihcations 4), but imposing some appropriate restrictions on them: 

Paget- ^ (bnden) = Qtnderr A (VS , S' . Pre(Pag<,,) (S) A Post(S)(S') 

^ 4>(S.K(r7))(S')) 

^ 4> h {Pre(Paget)} Caget 

Thus, under the assumption that the Pre predicate holds, we can again apply the 
inference rules for CAP commands to show the well-formedness of the Caget code. 
When we reach the hnal jump, we show that the Post predicate holds and then use that 
fact with the premise of the formula above to show that it is safe to jump to the return 
code pointer. 

The arrayget code can thus be certihed independent of any type system, by introduc- 
ing the quantified Paget predicate. Now, when we want to use this as an external function 
for XTAL programs, we instantiate Paget with Qaget above. We have to prove the premise 
oftheformulaabove, (VS,S'.Pre(Qager)(S) A Post(S)(S') 4>(S.K(r7))(S')). Prov- 
ing this is not difficult, because we use properties of the XTAL type system to show 
that from a state satisfying the precondition-;, c. there is a well-formed XTAL program 
whose register file satisfies the arrayget type interface- the changes described by the 
Post predicate will result in a state to which there does correspond another well-formed 
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XTAL program, one where the register rO is updated with the appropriate element of the 
array. Then we can let £(arrayget) = {Caget, P’^e{Qaget)) and we have satisfied Proof 
Obligation 9. Proof Obligation 10 follows almost directly given our definition of Qaget- 

In summary, we have shown how to certify runtime library code independent of a 
source language. In order to handle code pointers, we simply assume their safety as a 
premise; then, when using the library with a particular source language type system, we 
instantiate with a syntactic well-formedness predicate in the form of Cpinv and use the 
facilities of the type system for checking code pointers to prove the safety of indirect 
jumps. 

6 Implementation and Future Work 

We have a prototype implementation of the system presented in this paper, developed 
using the Coq proof assistant. Due to space constraints, we have left out its details 
here. As mentioned earlier in the paper, our eventual goal is to build an FPCC system 
for real IA-32 (Intel x86) machines. We have already applied the CAP type system 
to that architecture and will now need to develop a more realistic version of XTAL. 
Additionally, our experience with the Coq proof assistant leads us to believe that there 
should be more development on enhancing the automation of the proof tactics, because 
many parts of the proofs needed for this paper are not hard or complex, but tedious to 
do given the rather simplistic tactics supplied with the base Coq system. 

In this paper, we have implicitly assumed that the CAP machine code is generated 
from one of two sources: (a) XTAL source code, or (b) code written directly in CAP. 
However, more generally, our intention is to support code from multiple source type 
systems. In this case, the definition of CpGen (Section 5.3) would utilitize code precon- 
dition invariant generators (Cpinv) from the multiple type systems. The general form of 
each Cpinv would be the same, although, of course, the particular typing environments 
and judgments would be different for each system. Then we would have a series of the- 
orems like those in Section 5.3, specialized for each Cpinv. Proof Obligation 10 would 
also be generalized as necessary, requiring proofs that the interfaces between the vari- 
ous type systems are compatible. Of course there will be some amount of engineering 
required to get such a system up and running, but we believe that there is true potential 
for building a realistic, scalable FPCC framework along these lines. 

7 Related Work and Conclusion 

In the context of the original PCC systems cited in the Introduction, there has been 
recent work to improve their flexibility and reliability by removing type-system specific 
components from the framework [19]. These systems have the advantage of working, 
production-quality implementations but it is still unclear whether they can approach the 
trustworthiness goals of FPCC. 

We also mentioned the first approaches to generating FPCC, which utilized seman- 
tic models of the source type system, and their resulting complexities. Attempting to 
address and hide the complexity of the semantic soundness proofs, Juan Chen et al. [6] 
have developed LTAL, a low-level typed assembly language which is used to compile 
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core ML to FPCC. LTAL is based in turn upon an abstraction layer, TML (typed ma- 
chine language) [22], which is an even lower-level intermediate language. Complex 
parts of the semantic proofs, such as the indexed model of recursive types and strati- 
fied model of mutable fields, are hidden in the soundness proof of TML and as long 
as a typed assembly language can be compiled to TML, one need not worry about the 
semantic models. All the same, LTAL and TML are only assembly language type sys- 
tems, albeit at a much lower level that XTAL. They do not provide CAP’S generality 
of reasoning nor can their type systems be used to certify their own runtime system 
components. It should be clearly noted that the ideas presented in this paper are not re- 
stricted to use with a syntactic FPCC approach, as we have pursued. Integrating LTAL 
or TML with the CAP framework of this paper to certify their runtime system compo- 
nents seems feasible as well. 

Along the syntactic approach to FPCC, Crary [9, 10] applied our methods [14, 13] 
to a realistic typed assembly language initially targeted to the Intel x86. Fie even went 
on to specify invariants about the garbage collector interface, but beyond the interface 
the implementation is still uncertified. In his work he uses the metalogical framework 
of Twelf [21] instead of the CiC-based Coq that we have been using. 

In conclusion, there is much ongoing development of PCC technology for producing 
certified machine code from high-level source languages. Concurrently, there is exciting 
work on certifying garbage collectors and other low-level system libraries. However, 
integrating the high and low-level proofs of safety has not yet received much attention. 
The ideas presented in this paper represent a viable approach to dealing with the issue 
of interfacing and integrating safety proofs of machine code from multiple sources in a 
fully certified framework. 
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Abstract. Theorem provers for higher-order logics often use tactics to 
implement automated proof search. Often some basic tactics are designed 
to behave very differently in different contexts. Even in a prover that only 
supports a fixed base logic, such tactics may need to be updated dynam- 
ically as new definitions and theorems are added. In a logical framework 
with multiple (perhaps conflicting) logics, this has the added complexity 
that dehnitions and theorems should only be used for automation only 
in the logic in which they are defined or proved. 

This paper describes a very general and flexible mechanism for extensible 
hierarchical tactic maintenance in a logical framework. We also explain 
how this reflective mechanism can be implemented efhciently while re- 
quiring little effort from its users. 

The approaches presented in this paper form the core of the tactic con- 
struction methodology in the MetaPRL theorem prover, where they have 
been developed and successfully used for several years. 



1 Introduction 

Several provers [1, 2, 4-6, 9, 10, 18] use higher-order logics for reasoning because 
the expressivity of the logics permits concise problem descriptions, and because 
meta-principles that characterize entire classes of problems can be proved and 
re-used on multiple problem instances. In these provers, proof automation is 
coded in a meta-language (often a variant of ML) as tactics. 

It can be very useful for some basic tactics to be designed and/or expected 
to behave very differently in different contexts. One of the best examples of such 
a tactic is the decomposition tactic [14, Section 3.3] present in the NuPRL [1, 
4] and MetaPRL [9,13] theorem provers. When applied to the conclusion of a 
goal sequent, it will try to decompose the conclusion into simpler ones, normally 
by using an appropriate introduction rule. When applied to a hypothesis, the 
decomposition tactic would try to break the hypothesis into simpler ones, usually 
by applying an appropriate elimination rule. 
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Initiative (MURI) program administered by the Office of Naval Research (ONR) 
under Grant N00014-01- 1-0765, the Defense Advanced Research Projects Agency 
(DARPA), the United States Air Force, the Lee Center, and by NSF Grant CCR 
0204193. 
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Table 1. Decomposition Tactic Examples 





Goal sequent = 


Desired subgoals 


Conclusion decomposition 


■ ■■ \- aab = 


■■■ \- A and ■ ■ ■ \- B 


Hypothesis decomposition 


■ ■ ■ ; A A B\ ■ ■ ■ h ••• = 


^ • • • ; A; B; • • • h • • • 



Example 1. The desired behavior for the decomposition tactic on A-terms is 
shown in Table 1. 

Whenever a theory is extended with a new operator, the decomposition tactic 
needs to be updated in order for it to know how to decompose this new operator. 
More generally, whenever a new rule (including possibly a new axiom, a new 
definition, a new derived rule or a new theorem) is added to a system, it is often 
desirable to update some tactic (or possibly several tactics) so that it makes use 
of the newly added rule. For example, if a A introduction rule is added to the 
system, the decomposition tactic would be updated with the information that if 
the conclusion is a A-term, then the new introduction rule should be used. 

There are a number of problems associated with such tactic updates. A very 
important requirement is that performing these tactic updates must be easy and 
not require much effort from end-users. Our experience with NuPRL and Meta- 
PRL theorem provers strongly suggests that the true power of the updatable 
tactics only becomes apparent when updates are performed to account for almost 
all the new theorems and definitions added to the system. On the other hand, 
when updates require too much effort, many users forgo maintaining the general 
tactics, reverting instead to using various ad-hoc workarounds and using the 
tactics updated to handle only the core theory. 

Another class of problems are those of scoping. The updates must be managed 
in an extensible manner ~ when a tactic is updated to take into account a new 
theorem, all new proofs should be done using the updated tactic, but the earlier 
proofs might need to still use the previous version in order not to break. If a 
theorem prover allows defining and working in different logical theories, then 
the tactic update mechanism needs to make sure that the updated tactic will 
only attempt to use a theorem when performing proof search in the appropriate 
theory. And if the prover supports inheritance between logical theories, then 
the updates mechanism needs to be compositional - if a theory is composed of 
several subtheories (each potentially including its own theorems), then the tactic 
updates from each of the subtheories need to be composed together. 

Once the tactics updates mechanism becomes simple enough to be used for 
almost all new definitions, lemmas and theorems, efficiency becomes a big con- 
cern. If each new update slows the tactic down (for example, by forcing it to 
try more branches in its proof search), then this approach to maintaining tac- 
tics would not scale. At a minimum, the updates related to, for example, a new 
definition should not have significant impact on the performance of the tactic 
when proving theorems that do not make any use of the new definition (even 
when that definition is in scope). 

In the MetaPRL theorem prover [9,13] we have implemented a number of 
very general mechanisms that provide automatic scoping management for tactic 




138 Jason Hickey and Aleksey Nogin 



updates, efficient data structures for making sure that new data does not have 
a significant impact on performance, and a way for the system to come up with 
proper tactic updates automatically requiring only very small hints from the 
user. In Sections 3, 4 and 5 we describe these mechanisms and explain how they 
help in addressing all of the issues outlined above. In Section 6 we show how 
some of the most commonly used MetaPRL tactics are implemented using these 
general mechanisms. 

2 MetaPRL 

In order to better understand this paper, it is helpful to know the basic structure 
of the MetaPRL theorem prover. 

The core of the prover is its logical engine written in OCamI [15, 20]. MetaPRL 
theories are implemented as OCamI modules, with each theory having a separate 
interface and implementation files containing both logical contents (definitions, 
axioms, theorems, etc) as well as the traditional ML contents (including tac- 
tics). The logical theories are organized into an inheritance hierarchy [9, Section 
5.1], where large logical theories are constructed by inheriting from a number 
of smaller ones. The MetaPRL frontend is a CamlP4-based preprocessor that is 
capable of turning the mixture of MetaPRL content and ML code into plain ML 
code and passing it to OCamI compiler. Finally, MetaPRL has a user interface 
that includes a proof editor. 

3 Resources 

We implement the process of maintaining context-sensitive tactics is automated 
through a mechanism called resources. A resource is essentially a collection of 
scoped pieces of data. 

Example 2. The decomposition tactic of Example 1 could be implemented using 
a combination of two resources - a resource collecting information on introduc- 
tion rules (“intro resource”) and one collecting information on elimination rules 
(‘“elim resource”). For each of the two resources, the data points that would be 
passed to the resource manager for collection will each consist of a pattern (e.g. 
A A B) paired with the corresponding tactic (e.g. a tactic that would apply a A 
introduction rule) . 

The MetaPRL resource interface provides a scoping mechanism based on 
the inheritance hierarchy of logical theories. Resources are managed on a per- 
theorem granularity - when working on a particular proof, the resource state 
reflects everything collected from the current theory up to the theorem being 
proved, as well as everything inherited from the theories that are ancestors of 
the current one in the logical hierarchy, given in the appropriate order. 

Our implementation of the resource mechanism has two layers. The lower 
layer is invisible to the user - its functions are not supposed to be called directly 
by MetaPRL users; instead the appropriate calls will be inserted by the MetaPRL 
frontend. 
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Internal Interface. The interface contains the following: 
type bookmark 

type (’input, ’intermediate, ’output) description = 

{ empty: ’intermediate; 

add: ’intermediate -> ’input -> ’intermediate; 

retrieve: ’intermediate -> ’output } 

val create: string -> (’input, ’intermediate, ’output) resource -> 

bookmark -> ’ output 

val improve : string -> Obj.t -> unit 
val bookmark : string -> unit 
val extends_theory : string -> unit 
val close_theory : string -> unit 

val find : string * string -> bookmark 

The create function takes a name of the resource and a function for turning 
a list of collected data points (in the order that is consistent with the order in 
which they were added to the appropriate theories and the inheritance hierarchy 
order) of the ’ input type into an appropriate result of the ’ output type (usually 
tactic, int -> tactic, or similar) and returns a lookup function that given a 
theorem bookmark will give the value of that resource at that bookmark. 

The lookup function is lazy and it caches both the ’ intermediate and the 
’ output results. For example, if bookmark B extends from a bookmark A and 
the lookup is called on bookmark B, then the lookup system will use the add 
function to fold all the relevant data into the empty value and will memoize 
the ’ intermediate values for all the bookmarks it encounters above B in the 
inheritance hierarchy (including A and B itself). Next it calls the retrieve 
function to get the final data for the bookmark B, memoizes and returns the 
resulting data. Next time the lookup is called on bookmark B, it will simply 
return the memoized ’ output data. Next time the lookup function is called on 
the bookmark A, it will only call the retrieve on the memoized ’ intermediate 
value (and memoize the resulting ’output value as well)^. Finally, next time the 
lookup function is called on another descendant of A, the lookup function will 
retrieve the ’intermediate value for A and then add the remaining data to it. 

The improve function adds a new data entry to the named resource. Note 
that the Obj .t here signifies a shortcut across the type system - the MetaPRL 
frontend will add a coercion from the actual input type into an Obj . t and will 
also add a type constraint on the expression being coerced to make sure that 
this mechanism is still type safe. 

The bookmark function adds the named bookmark to all resources (the name 
here is usually a name of a theorem); extends_theory tells the resource manager 

^ In case there were no new data added between A and B, then the ’output value for 
B will be simply reused for A. 
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that the current theory inherits from another one (and that it needs to inherit 
the data for all resources); close_theory tells the resource manager that it 
has received all the resource data for the named theory and that all the recent 
bookmark and extends_theory calls belong to the given theory. 

Finally, find finds the bookmark for a given theorem in a given theory. 



External Interface. The next layer is essentially a user interface layer and 
it consists of statements that would be recognized by the MetaPRL frontend in 
MetaPRL theories. First, in the theory interface files we allow declarations of the 
form 

resource (input , output) name 

where resource is a MetaPRL keyword, input and output are the ML types 
describing the data inputs that the resource is going to get and the resulting 
output type, and name is, unsurprisingly, the name of the resource (must be 
globally unique). 

Whenever a resource is declared in a theory interface, the corresponding 
implementation file must define the resource using the 
let (input , output) resource name = expr 

construct, where expr is an ML expression on an appropriate description 
type. When a resource is defined, the MetaPRL frontend will create a function 
get_name -resource of the type bookmark -> output. 

All tactics in MetaPRL have access to the “proof obligation” object, which 
includes the bookmark corresponding to the scope of the current proof. By apply- 
ing the appropriate get_name .resource function to the current bookmark, the 
tactic can get access to the appropriate value of any resource. This mechanism 
is purely functional - there is no imperative “context switch” when switching 
from one proof to another; instead tactics in each proof have immediate access 
to the local value of each resource’s ’ output data. 

Example 3. Once the intro and elim resources (with the tactic and int -> 
tactic output types respectively) of Example 2 are defined (Section 6.4 will 
describe the implementation in detail), the decomposition tactic could be imple- 
mented as simple as 

let dT p n = 

let bookmark = get.bookmark p in 
if n = 0 then 

get_intro_resource bookmark 
else 

get_elim_resource bookmark n 

where p is the proof obligation argument and n is the index of the hypothesis to 
decompose (index 0 stand for the conclusion). 

To add data to a resource, a MetaPRL user only has to write (and as we will 
see in Section 4 even that is usually automated away): 
let resource name += expr 
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where expr has the appropriate input type, and the MetaPRL frontend will 
translate it into an appropriate improve call. 

The scoping management of the resource data is fully handled by the fron- 
tend itself, without requiring any resource-specific input from the user. When- 
ever a logical theory is specified as inheriting another logical theory (using the 
extends Theoryjname statement), the frontend will include the appropriate 
extends_theory call. Whenever a new theorem is added to a theory, the frontend 
will insert a call to bookmark. Finally, at the end of each theory, the frontend 
will insert a call to close_theory. 

4 Reflective Resource Annotations 

Even in the presence of the hierarchical resource mechanism of Section 3, writing 
the appropriate let resource += code every time new definitions and theorems 
are added takes some expertise and could be somewhat time consuming. On the 
other hand, it also turns out that most such resource updates are rather uniform 
and most of the needed information is already present in the system. If the rules 
are expressed using a well-defined logical meta-language (such as the sequent 
schemas language [16] used by MetaPRL), then we can use the text of the rules 
as a source of information. 

Example 4- Suppose a new introduction rule is added to the system and the 
user wants to add it to the intro resource. Using the mechanism given in the 
previous Section, this might look as^: 

rhxyz|a;b;c} 

let resource intro += (xyz{a;6;c}, xyzl) 

In the above example it is clear that the resource improvement line is pretty 
redundant^ - it does not contain anything that can not be deduced from the 
rule itself. If the system would be given access both to the text of the rule 
and the primitive tactic for applying the rule^, it will have most (if not all) of 
the information on how to update the decomposition tactic! By examining the 
text of the rule it can see what kind of term is being introduced and create an 
appropriate pattern for inclusion into the resource and it is clear which tactic 
should be added to the intro resource - the primitive tactic that would apply 
the newly added rule. 

By giving tactics access to the text of the rules we make the system a bit 
more reflective - it becomes capable of using not only the meaning of the rules 
in its proof search, but their syntax as well. 

^ For clarity, we are using the pretty-printed syntax of MetaPRL terms here in place 
of their ASCII representations. 

® The redundancy is not very big here, of course, but in more complicated examples 
it can get quite big. 

^ In the MetaPRL system, all the rules (including derived rules and theorems) are 
compiled to a rewriting engine bytecode, so a tactic for applying a primitive rule 
does not have direct access to the text of the rule it is applying. 
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From the MetaPRL user’s perspective this mechanism has a form of resource 
annotations. When adding a new rule, a user only has to annotate it with the 
names of resources that need to be automatically improved. Users can also pass 
some optional arguments to the automatic procedure in order to modify its 
behavior. As a result, when a new logical object (definition, axiom, theorem, 
derived rule, etc.) is added to a MetaPRL theory, the user can usually update all 
relevant proof search automation by typing only a few extra symbols. 

Example 5. Using the resource annotations mechanism, the code of the Exam- 
ple 4 above would take the form 

rule xyzl {| intro [] |}: |_ xyzjo- b- c} 

where the annotation “{ I intro [] | }” specifies that the new rule has to be 
added to the intro resource. 

Example 6. The resource annotation for the A elimination rule in MetaPRL 
would be written as { I elim [ThinOption thinT] I } which specifies that the 
elim resource should be improved with an entry for the A term and that by de- 
fault it should use the thinT tactic to thin out (weaken) the original A hypothesis 
after applying the elimination rule. 



5 Term Table 

One of the most frequent uses of resources is to construct tactics for term rewrit- 
ing or rule application based on the collection of rewrites and rules in the logic. 
For example, as discussed the dT tactic selects an inference rule based on the 
term to be decomposed. Abstractly, the dT tactic defines a (perhaps large) set 
of rules indexed by a set of terms. 

In a very naive implementation, given a term to decompose, the prover would 
apply each of the rules it knows about in order until one of them succeeds. This 
would of course be very inefficient, taking time linear in the number of rules in 
the logic. There are many other kinds of operations that have the same kind of 
behavior, including syntax-directed proof search, term evaluation, and display 
of terms [8] based on their syntax. 

Abstractly stated, the problem is this: given a set S of {pattern, value) pairs, 
and a term t, find a matching pattern and return the corresponding value. In case 
there are several matches, it is useful to have a choice between several strategies: 

— return the most recently added value corresponding to the “most specific” 
match, or 

— return the list of all the values corresponding to the “most specific” match 
(most recently added first), or 

— return the list of all values corresponding to the matching patterns, the ones 
corresponding to “more specific” matches first. 



Note that when values are collected through the resources mechanism, the “most 
resent” means the “closest to the leaves of the inheritance hierarchy” . In other 
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words we want to get the value from the most specialized subtheory first, because 
this allows specialized theories to shadow the values defined in the more generic 
“core” theories. 

We call this data structure a term table and we construct it by collecting 
patterns in an incremental manner into a discrimination tree [3,7]. Since the 
patterns we use are higher-order, we simplify them before we add them to the dis- 
crimination tree (thus allowing false positive matches). The original higher-order 
patterns are compiled into the bytecode programs for the MetaPRL rewriting en- 
gine [12], which can be used to test the matches found by the discrimination tree, 
killing off the false positives. 

We begin the description with some definitions. 

5.1 Second-Order Patterns 

MetaPRL represents syntax using the term schemas [16]. Each term schema is 
either an object (first-order) variable v, a second-order meta- variable v or it has 
an operator that represents the “name” of the term drawn from a countable 
set, and a set of subterms that may contain bindings of various arities. The 
second-order variables are used to specify term patterns and substitution for 
rewriting [16]. 

The following table gives a few examples of term syntax, and their conven- 
tional notation. The lambda term contains a binding occurrence: the variable x 
is bound in the subterm b. 



Displayed form 


Term 


Xx.b 


lambdaj x. b } 


/(a) 


apply { f; a } 


x + y 


sum{ x; y } 



Second-order variables are used to specify term patterns and substitution 
for rewriting. A second-order variable pattern has the form ^[vi] • • • ; Vn], which 
represents an arbitrary term that may have free variables vi, . . . ,Vn- The corre- 
sponding substitution has the form i^[ti; • • • ;t„], which specifies the simultane- 
ous, capture-avoiding substitution of terms , . . . , for v\, . . . ,Vn in the term 
matched by v. 

Example 7. Below are a few examples illustrating our second-order pattern lan- 
guage. For the precise definition of the language, see [16]. 

— Pattern v + v matches the term 1-1-1, but does not match the term 1-1-2. 
~ The term \x.x+ 1 is matched by the pattern \x.v[x], but not by the pattern 

Xx.v. 

— Pattern Xxm[x] + v[l] matches terms Xy.y + 1 and Xz.2 * z + 2*l. 

5.2 Simplified Patterns 

Before a second-order pattern is added to a discrimination tree, we simplify it by 
replacing all second-order variable instances with a “wildcard” pattern and by 
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re-interpreting first-order variables as matching an arbitrary first-order variable. 
The language of simplified patters is the following: 

p _ the wildcard pattern 

I operatorifpi, . . . ,p„) a pattern with n subterms 
I V stands for an arbitrary first-order variable 

p ::= (i,p) i stands for the number of binding occurrences 

A matching relation can be defined on term patterns t ~ p as follows. First 
any term t matches the wildcard pattern t ~ For the inductive case, the term 
t = operator{ti, . . . ,tn) matches the pattern p = operator{p^, . . . ,p„) iff tj ~ pj 
for all j € {1, . . . , n}. A subterm vi, . . . , Vm-t matches a subpattern (m,p) iff 
t~p. 

In many cases when the term table is constructed, a term will match several 
patterns. In general, the table should return the most specific match. Any non- 
wildcard pattern is considered more specific than the wildcard one and two 
patterns with the same top-level operator and arities are compared according to 
the lexicographical order of the lists of their immediate subpatterns. 

5.3 Implementation 

The term tables are implemented by collecting the simplified patterns into dis- 
crimination trees. Each pattern in the tree is associated with the corresponding 
value, as well as with the rewriting engine bytecode that could be used to check 
whether a potential match is a faithful second-order match. When adding a 
new pattern to a tree, we make sure that the order of the entries in the tree is 
consistent with the “more specific” relation. 

As described in the beginning of this Section, our term table implementation 
provides a choice of several lookup functions. In addition to choosing between 
returning just the most specific match and returning all matches, the caller also 
gets to choose whether to check potential matches with the rewriting engine 
(which is slower, but more accurate) or not (which is faster, but may return 
false positives). 

In order to take advantage of the caching features of the resources mecha- 
nism, the interface for building term tables is incremental ~ it allows extending 
the tables functionally by adding one pattern at a time. In order to make the in- 
terface easier to use, we provide a function that returns an appropriate resource 
description (see Section 3) where the ’ intermediate type is a term table and 
only the retrieve field needs to be provided by the user. 



6 Resource-Driven Tactics in MetaPRL 

The resource mechanisms described in the previous sections are a big part of all 
the most commonly used tactics in MetaPRL. In this Section we will describe 
some of the most heavily used MetaPRL resources. The goal of this chapter, 
however, is not to describe some particular MetaPRL tactics, but to give an 
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impression of the wide range of tactics that the resource mechanisms could be 
used to implement. 

Those who are interested in full details of a particular tactic implementation 
may find additional information in [11]. 

6.1 The “n-th Hypothesis” Tactic 

The ‘'^conclusion immediately follows from the n-th hypothesis” tactic 
(“nthHypT”) is probably the simplest resource-based tactic in MetaPRL. It is 
designed to prove sequents like T;T;Z\I-T, F; x : T; A[x] h TType, and 
F; X : Void; A[x] h C[x], where the conclusion of the sequent immediately 
follows from a hypothesis. 

The nthHypT is implemented via a term table resource that maps terms to 
int -> tactic. The input to the nthJiyp resource is a term containing both 
the hypothesis and the conclusion packed together and the output is the cor- 
responding tactic that is supposed to be able to prove in full any goals of that 
form. As in Example 3, the code of the tactic itself takes just a few lines: 

let nthHypT p n = 

let t = make_pair_term (get_nth_hyp p n) (get_concl p) in 
lookup (get_nth_hyp_resource p) t n 

where p is the current proof obligation and n is the hypothesis number (same as 
in Example 3). 

MetaPRL also implements the annotations for this resource - whenever a rule 
is annotated with { I nthJiyp I }, MetaPRL would check whether the annotated 
rule has the correct form and if so, if will pair its hypothesis with its conclusion 
and add the corresponding entry to the nthJiyp resource. 

6.2 The Auto Tactic 

In addition to the decomposition tactic we have already mentioned, the generic 
proof search automation tactics are among the most often used in MetaPRL. We 
have two such tactics. The autoT tactic attempts to prove a goal “automatically,” 
and the trivialT tactic proves goals that are “trivial.” The resource mechanism 
allowed us to provide a generic implementation of these two tactics. In fact, the 
implementation turns out to be surprisingly simple - all of the work in automatic 
proving is implemented by the resource mechanism and in descendant theories. 

The auto resource builds collections of tactics specified by a data structure 
with the following type: 

type auto_info = 

{ auto_name ; string; 
auto_tac ; tactic; 
auto_prec ; auto_prec; 
auto_type ; auto_type; 

} 




146 Jason Hickey and Aleksey Nogin 



and auto_type = 

AutoTrivial 
I AutoNormal 
I AutoComplete 

The auto_name is the name used to describe the entry (for debugging pur- 
poses). The auto_tac is the actual tactic to try. auto_prec is used to divide the 
entries into precedence levels; tactics with higher precedence are applied first. 

Finally, auto_type specifies how autoT and trivial! will use each particular 
entry. AutoTrivial entries are the only ones used by trivial!; auto! attempts 
using them before any other entries. AutoComplete will be used by auto! after 
all AutoTrivial and AutoNormal entries are exhausted; it will consider an ap- 
plication of an AutoComplete entry to be successful only if it would be able to 
completely prove all subgoals generated by it. 

The onSomeHyp! nthHyp! (“try finding a hypothesis nthHyp! would apply 
to” - see Section 6.1) is an important part of the trivial! tactic and d! 0 - 
the intro part of the decomposition tactic - is an important part of the auto! 
tactic. As we will see in Section 6.4, parts of d! 0 are added to the AutoNormal 
level of auto! and the rest - at the AutoComplete level. 



6.3 Type Inference 

Another very interesting example of a resource-driven approach in MetaPRL is 
type inference. There are several factors that make it stand out. First, the type 
inference resource is used to create a recursive function, with each data item 
responsible for a specific recursive case. Second, the output type of the type 
inference resource is not a tactic, but rather a helper function that can be used 
in various other tactics. 

The type inference resource is complicated (unfortunately), so we will start 
by presenting a very simplified version of the resource. Suppose, the pair operator 
and the Cartesian product type is added to a logic and we want to augment the 
type inference resource. In first approximation, this would look something like 

let typeinf_pair infer t = 

let a, b = destruct_pair_term t in 

construct_product_term (infer a) (infer b) 

let resource typeinf += ((a, 6), typeinf _pair) 

where the infer argument to the typeinf_pair function is the type inference 
function itself, allowing for the recursive call. The typeinf resource would be 
implemented as a term lookup table (see Section 5) and will therefore have the 
input type term and the output type inf erenceJun -> inf erenceJun, where 
inference Jun is defined as term -> term. 

Once the table resource is defined, the actual type inference function can be 
defined simply as follows: 
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let infer_type p = 

let table = get_typeinf _resource p in 

let rec infer t = lookup table t infer t in 
infer 

where p is the current proof obligation (same as in Example 3). Above, lookup 
table t returns the appropriate inf erencejfun -> inf erencejfun function 
which, given the infer itself returns the inference Jun function which is then 
used to infer the type of the term t. 

MetaPRL currently has two different implementations of a type inference 
algorithm. Both implementations are similar to the simplified one outlined above, 
except the inf erencejfun type would be more complicated. 

The first implementation is used as a heuristic for inferring a type of expres- 
sions in a Martin-L6f style type theory. There the type is defined as 

type typeinf_fun = 

ty_var_set -> var_env -> eqnlist -> opt_eqns -> var_env -> term 
-> eqnlist * opt_eqns * var_env * term 

An inference function takes as arguments: a set of variables that should be 
treated as constants when we use unification, a mapping from variable names 
to the types these variables were declared with, a list of equations we have on 
our type variables, a list of optional equations (that could be used when there is 
not enough information in the main equations, but do not have to be satisfied), 
a list of default values for type variables (that can be used when the equations 
do not provide enough information), and a term whose type we want to infer. 
It returns the updated equations, the updated optional equations, the updated 
defaults and the type (possibly containing new variables). The corresponding 
infer_type would call the infer function and then use unification to get the 
final answer. 

The second implementation is used for inferring a type of expressions in an 
ML-like language with a decidable type inference algorithm and it is a little 
simpler than the type theory one, but is still pretty complicated. 

In future we are hoping to add resource annotation (at least some partial 
one) support to the type inference resources, however it is not obvious whether 
we would be able to find a sufficiently general (yet simple) way of implementing 
the annotations. For now the type inference resources have to be maintained by 
manually adding entries to it, which is pretty complicated (even the authors of 
these resources have to look up the inf erencejfun type definitions once in a 
while to remind themselves of the exact structure of the resource) Because of 
such complexity this solution is not yet fully satisfactory. 

6.4 Decomposition Tactic 

As we have mentioned in Examples 1, 3 and 5, in MetaPRL the decomposition 
tactic (“dT”) is implemented using two resources. The intro resource is used to 
collect introduction rules; and the elim resource is used to collect elimination 
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rules. The components of both resources take a term that describes the shape of 
the goals to which they apply, and a tactic to use when goals of that form are 
recognized. The elim resource takes a tactic of type int -> tactic (the tactic 
takes the number of the hypothesis to which it applies), and the intro resource 
takes a tactic of type tactic. 

The resources also allow resource annotations in rule definitions. Typically, 
the annotation is added to explicit introduction or elimination rules, like the 
following: 



rule and_intro { I intro [] 



1 . r h A T h B 
r h AAB 



Once this rule is defined, an application of the tactic dT 0 to a conjunction 
will result in an application of the and_intro rule. 

The intro resource annotations take a list of optional arguments of the 
following type: 



type intro_option = 

SelectOption of int 

I IntroArgsOption of (proof_obl -> term -> term) * term option 

I AutoMustComplete 

I CondMustComplete of proof_obl -> bool 

The SelectOption is used for rules that require a selection argument. For 
instance, the disjunction introduction rule has two forms for the left and right- 
hand forms. 

r h A 

rule or_intro_lef t {| intro [SelectOption 1] |}: I- A V B 

I D 

rule or_introjright {| intro [SelectOption 2] |>: A V B 

These options require sell arguments: the left rule is applied with sell 1 
(dT 0) and the right rule is applied with sell 2 (dT 0). 

The IntroArgsOption is used to infer arguments to the rule. A typical usage 
would have the form 

rule apply_type { I intro [ intro_typeinf a ] I } A : 

r h f € (A ^ B) r h a€ A 
r h (fa) €B 

where intro_typeinf is an appropriate IntroArgsOption option that uses the 
type inference resource (Section 6.3). Once such rule is added to the system, 
whenever a proof obligation has the form • • • h (f a) € B, dT 0 would attempt 
to infer the type of the corresponding a and use such type as an argument to 
the apply_type rule. 

The AutoMustComplete option can be used to indicate that the autoT tactic 
(see Section 6.2) should not use this rule unless it is capable of finishing the proof 
on its own. This option is used to mark irreversible rules that may take a prov- 
able goal and produce potentially unprovable subgoals. The CondMustComplete 
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option is a conditional version of AutoMustComplete; it is used to pass in a 
predicate controlling when to activate the AutoMustComplete. 

The elim resource options are defined with the following type: 

type elim_option = 

ThinOption of (int -> tactic) 

I ElimArgsOption of (proof_obl -> term -> term list) * term option 

The ElimArgsOption provides the tactic with a way to find correct rule 
arguments in the same way as the IntroArgsOption does it in the intro case. 
The ThinOption is an argument that provides an optional tactic to “thin” the 
hypothesis after application of the elimination rule. 

The dT resources are implemented as term tables that store the term descrip- 
tions and tactics for “decomposition” reasoning. The dT tactic selects the most 
appropriate rule for a given goal and applies it. The (dT 0) tactic is added to 
the auto resource by default. 

6.5 Term Reduction and Simplification Tactic 

The resource mechanisms are also widely used in MetaPRL rewriting tactics. The 
best example of such a tactic is the reduceC reduction and simplification tactic, 
which reduces a term by applying standard reductions. For example, the type 
theory defines several standard reductions, some of which are listed below. When 
a term is reduced, the reduceC tactic applies these rewrites to its subterms in 
outermost order. 

rewrite beta {| reduce |}: (Au.i^i[v]) V 2 < 

rewrite pair {| reduce |}: 

(match {vi,V2) with (u,v) ^ iy3[u,v]) < — > 

The reduce resource is implemented as a term table that, given a term, 
returns a rewrite tactic to be applied to that term. The reduceC rewrite tactic 
is then constructed in two phases: the reduceTopC tactic applies the appropriate 
rewrite to a term without examining subterms, and the reduceC is constructed 
from tacticals (rewrite tacticals are also called conversionals), as follows, 
let reduceC = repeatC (higherC reduceTopC) 

The higherC conversional searches for the outermost subterms where a rewrite 
applies, and the repeatC conversional applies the rewrite repeatedly until no 
more progress can be made. 

7 Conclusions and Related Work 

The discrimination trees and other term-indexed data structures are a standard 
technique in a large number of theorem provers. The main novelties of our term 
tables approach is in the integration with the rest of the resources mechanism 
and in usage of the rewriting engine to perform matching against second-order 
patterns. 
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A number of provers include some term-indexed and/or context-sensitive 
and/or updatable tactics. Examples include the decomposition and Auto tactics 
in NuPRL [14], simplification tactics in Isabelle [17], table tactics in the Ergo 
theorem prover [19]. Our goal however was to provide a generic mechanism 
that allows for easy creation of new scoped updatable tactics and, even more 
importantly, provides a very simple mechanism for updating all these tactics is 
a consistent fashion. 

While initially the mechanisms presented in this paper were only meant to 
simplify the implementation of a few specific tactics (mainly the decomposition 
tactic), the simplicity and easy-of-use of this approach gradually turned it into 
the core mechanism for implementing and maintaining tactics in the MetaPRL 
theorem prover. 
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Abstract. Proof reuse addresses the issue of how proofs of theorems in a 
specific setting can be used to prove other theorems in different settings. 
This paper proposes an approach where theorems are generalised by ab- 
stracting their proofs from the original setting. The approach is based 
on a representation of proofs as logical framework proof terms, using the 
theorem prover Isabelle. The logical framework allows type-specific in- 
ference rules to be handled uniformly in the abstraction process and the 
prover’s automated proof tactics may be used freely. This way, estab- 
lished results become more generally applicable; for example, theorems 
about a data type can be reapplied to other types. The paper also consid- 
ers how to reapply such abstracted theorems, and suggests an approach 
based on mappings between operations and types, and on systematically 
exploiting the dependencies between theorems. 



1 Introduction 

Formal proof and development requires considerable effort, which can be re- 
duced through reuse of established results. Often, a new datatype or theory 
resembles a previously developed one and there is considerable gain if theorems 
can carry over from one type to another. Previous work in this area addresses 
reuse by proof or tactic modification in response to changes in the proof goal 
such as modifying the constructors of a datatype, or unfortunate variable in- 
stantiations during a proof search [6,7,14,22,26]. In contrast, type-theoretic 
approaches [12, 13,20] investigate the generalisation and modification of proofs 
by transforming the associated proof terms in the context of constructive type 
theory. This paper proposes a method for abstracting previously established the- 
orems by proof transformations in a logical framework with proof terms. Logical 
frameworks are particularly well-suited for this approach, because inference rules 
are represented as formulae in the formalism; the choice of object logic becomes 
independent of the meta-logic in which the proof terms live. 

The method we propose has been implemented in Isabelle [16], using the proof 
terms recently added by Berghofer and Nipkow [4[. Isabelle offers a wide range 
of powerful tactics and libraries, and we can work in any of the logics encoded 
into Isabelle, such as classical higher-order logic (HOL), Zermelo-Fraenkel set 
theory (ZF), and various modal logics [17] . However, the approach should be 
applicable to any logical framework style theorem prover. 
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The paper is organised as follows. Sect. 2 presents the different proof transfor- 
mations of the abstraction method and Sect. 3 discusses how the transformations 
are implemented as functions on Isabelle’s proof terms. Sect. 4 considers how to 
reuse abstracted theorems in a different setting, and demonstrates our approach 
in practice. Sect. 5 considers related work and we conclude in Sect. 6. 



2 Generalising Theorems by Proof Transformation 

This section proposes a method for abstracting theorems in logical frameworks by 
means of proof transformations, in order to derive generally applicable inference 
rules from specific theorems. A logical framework [8, 19] is a meta-level inference 
system which can be used to specify other, object-level, deductive systems. Well- 
known examples of implementations of logical frameworks are Elf [18], AProlog 
]15], and Isabelle ]17]. The work presented uses Isabelle, the meta-logic of which 
is intuitionistic higher-order logic extended with Hindley-Milner polymorphism 
and type classes. 

In the logical framework, the formulae of an object logic is represented by 
higher-order abstract syntax and object logic derivability by a predicate on the 
terms of the meta-logic: Meta-level implication reflects object level derivabil- 
ity. Object logics are represented by axioms encoding the axioms and inference 
rules of the object logic. The meta-logic is typed, with a special type prop of 
logical formulae (propositions); object logics extend the type system. The meta- 
level quantifier /\ can range over terms of any type, including prop. The logical 
framework allows us to prove theorems directly in the meta-logic. The correct- 
ness of all instantiations of a meta-logic theorem, or schema, follows from the 
correctness of the representation of the rules of the object logic. Theorems es- 
tablished in the meta-logic are derived inference rules of the object logic. Hence, 
new object logic inference rules can be derived within the logical language. 

For the presentation of the abstraction method, we consider a proof tt of 
a theorem (j), consisting of a series of inference steps in the meta-logic. The 
proposed generalisation process will transform tt in a stepwise manner into a 
proof of a schematic theorem which may be instantiated in any other setting, 

i.e. a derived inference rule of the logic. The process consists of three phases: 

1. making assumptions explicit; 

2. abstracting function symbols; 

3. abstracting type constants. 

Each step in this process results in a proof of a theorem, obtained by transform- 
ing the proof of the theorem from the previous step. In order to replace function 
symbols by variables, all relevant information about these symbols, such as defin- 
ing axioms, must be made explicit. In order to replace a type constant by a type 
variable, function symbols of this type must have been replaced by variables. 
Hence, each phase of the transformation assumes that the necessary steps of the 
previous phases have already occurred. The final step results in a proof tt' from 
which we derive a schematic theorem <f>' , where 4>' is a modification of 




154 Einar Broch Johnsen and Christoph Liith 



the initial formula (j). In such theorems, the formulae of tl’ are called applicability 
conditions as they identify theorems that are needed to successfully apply the 
derived rule. A necessary precondition for the second abstraction step is that the 
logical framework allows for higher-order variables, and for the third step that 
the logical framework allows for type variables. 

It is in principle possible to abstract over all theorems, function symbols, 
and types occurring in a proof. However, such theorems are hard to use; for 
applicability, it is essential to strike a balance between abstracting too much 
and too little. Some tactics guiding the application of abstracted theorems are 
considered in Sect. 4. 



2.1 Making Proof Assumptions Explicit 

In tactical theorem provers such as Isabelle, the use of auxiliary theorems in a 
proof may be hidden to the user, due to the automated proof techniques. These 
contextual dependencies of a theorem can be made explicit by inspecting its 
proof term. In a natural deduction proof, auxiliary theorems can be introduced 
as leaf nodes in open branches of the proof tree. 

Given an open branch with a leaf node theorem in the proof, we can close the 
branch by the implication introduction rule, thus transforming the conclusion of 
the proof. By closing all open branches in this manner, every auxiliary theorem 
used in the proof becomes visible in the root formula of the proof. To illustrate 
this process, let us reconsider the proof tt of theorem (j). At the leaf node of an 
open branch in the proof we find a theorem, say . . . , We close the 

branch tt^ by applying =^-introduction at the root of the proof, which leads 
to a proof of a formula yx\, . . . ,x\, ipi{x\, . . . ,x\.) where ipi has been 

transformed into a closed formula by quantifying over free variables, to respect 
variable scoping. The transformation of a branch is illustrated in Figure 1. This 
process is repeated for every branch in tt with a relevant theorem in its leaf 
node. If we need to make j theorems explicit, we thereby derive a proof tt' of 
the formula (V'i A . . . A f/'j) P- 

Generally, we may assume that a leaf node theorem is stronger than neces- 
sary for the specific proof. Therefore, it is possible to modify the applicability 
conditions of the derived theorem in order to make these easier to prove in a 
new setting. For example, if ipi is simplified by an elimination rule in the branch, 
we may (repeatedly) cut off the branch above the weaker theorem before closing 
the branch. Proofs in higher-order natural deduction can be converted into a 
normal form where all elimination rules appear above the introduction rules in 
each branch of the proof [21]. With this procedure, proofs on normal form result 
in the weakest possible applicability conditions, but proofs on normal form are 
not required for the abstraction process and proof normalisation is therefore not 
considered in this paper. Furthermore, if ipi is the leaf node theorem of an open 
branch in the proof tt and all leaf node theorems in open branches in the proof 
of Ipi are included among the leaf node theorems of other open branches of tt, 
expanding tt with the proof of ipi at appropriate leaf nodes before the proof 
transformation will remove superfluous applicability conditions from the derived 
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Fig. 1. The transformation and closure of a branch in the proof, binding the free 
variable x of the leaf node formula. 



theorem. The usefulness of these improvements depends on the form of the proof 
and may cause considerable growth in the size of the proof term. An alterna- 
tive is to consider the dependency graph between theorems (see Sect. 4.4). Our 
present approach is to transform the proof as it is given. 



2.2 Abstracting Function Symbols 

The next phase of the transformation process consists of replacing function sym- 
bols by variables. When all implicit assumptions concerning a function symbol 
F have been made explicit, as in the transformed theorem above, all relevant in- 
formation about this function symbol is contained within the new theorem. The 
function symbol has become an eigenvariable because the proof of the theorem 
is independent of the context with regard to this function symbol. Such function 
symbols can be replaced by variables throughout the proof. Let 4>[x/t] and 7r[x/t] 
denote substitution, replacing t by a: in a formula 4> or proof tt, renaming bound 
variables as needed to avoid variable capture. 

A central idea in logical framework encodings is to represent object logic vari- 
ables by meta-logic variables [19], which are placeholders for meta- logic terms. 
Hereafter, all free variables will be meta-variables and the abstraction process 
replaces function symbols by meta- variables. If the function symbol F is of type 
T and a is a meta- variable of this type, the theorem {tjj[ A ... A '00 0 may 

be further transformed into 

ma/F] A ... A 0'[a/F]) ^ 0[«/F], (1) 

by transforming the proof tt' into a new proof Tr'\a/F]. 

2.3 Abstracting Types 

When all function symbols depending on a given type have been replaced by term 
variables, the name of the type is arbitrary. In fact, we can now replace such 
type constants by free type variables. The higher-order resolution mechanism of 
the theorem prover will then instantiate type variables as well as term variables 
when we attempt to apply the derived inference rule to a proof goal. However, 
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the formal languages used by theorem provers have structured types which may 
give rise to type-specific inference rules. When these occur in the proofs, they 
must also be made explicit for type abstraction to work. This is illustrated by 
the following example. 



2.4 Example 

We assume as object logic a higher-order equational logic, with axioms including 
symmetry (sym), reflexivity (refl), etc. In this object logic, consider a theory 
including the standard operations 0, S, -I-, and axioms defining addition and 
induction on the type N of natural numbers: 



axl = a; -I- 0 = X ax2 = x + Sy = S{x + y) 
ind = |p(0); A Pi.x). 

A proof ofx-h0 = 0-|-xin this theory is as follows, slightly edited for brevity: 



axl 



[t -|- 0 — 0 -|- 1] ^ 



ax2 



St + 0 = S{t + 0) -h 0) = 0 -h St 



trans 



St + 0^0 + St 



t + O^O + t- 



St + O^O + St 



^-intro\ 

l\-intro 



>-elim 



0-l-0 = 0-l-0 j\t-t + Q^Q + t=^ St + + St 

X + Q = Q + X 

(2) 

Applying the first step of the abstraction method, all theorems from the theory 
become assumptions, which results in a proof of the following theorem: 



;/\t.p{t) =>p{St)] 
X -I- 0 = 0 -I- X 



■ p(x); x + 0 = x;x + Sy = S{x + y))] 



In the second and third step of the process, we first replace 0, S, and -|- by the 
met a- variables a, b, and c, respectively. When this is done, we can replace the 
type constant N with a free type variable a, resulting in a proof of the theorem 

Ib(a); A P{Ht))] p(x); c(x, a) = x; c(x, b(y)) = b(c(x, y))] 

c(x, a) = c(a, x), 

which can be applied as an inference rule to a formula of any type. In order to 
discharge the applicability conditions of the inference rule, the formula repre- 
senting the induction rule must be a theorem for the new type. 



3 Implementation of the Abstraction Techniques 

Under the Curry-Howard isomorphism, proofs correspond to terms in a typed 
A-calculus. We have implemented the abstraction processes from Sect. 2 in the 




Theorem Reuse by Proof Term Transformation 



157 



theorem prover Isabelle, which records proofs as meta-logic proof terms. The user 
can use all of Isabelle’s automatic and semi-automatic proof infrastructure and 
Isabelle automatically constructs the corresponding meta-logic proof term [4]. 
Given a proof term, a theorem may be derived by replaying the meta-logic 
inference rules. We use this facility to derive new theorems: Given a theorem to 
abstract, we obtain its proof term, perform appropriate transformations on the 
proof term and replay the derived proof term to obtain a generalised theorem. 
Hence, the correctness of the derived theorem is guaranteed by the Isabelle’s 
replay facility for proof terms. The implementation of abstraction functions does 
not impose any restrictions on the proof or the theorem: The abstraction process 
can be applied to any theorem, including those from Isabelle’s standard libraries. 

3.1 Proof Terms 

This section introduces Isabelle’s proof terms, which may be presented as 

p::= h\ \\h:(j>.p\Xx::T.p\p-p\pt (3) 

where h, c, x, t, (j>, a, and r denote proof variables, proof constants, term vari- 
ables, terms of arbitrary type, propositions, type variables, and types, respec- 
tively. The language defined by (3) allows for abstraction over term and proof 
variables, and application of proofs and terms to proofs, corresponding to the 
introduction and elimination of /\ and Proof terms live in an environment 
which maps proof variables to terms representing propositions and term variables 
to their type. Proof constants correspond to axioms or already proved theorems. 
For more details and formal definitions, including the definition of provability in 
this setting, see [4]. 

Proof terms can be illustrated by the example of Proof (2). We identify 
theorem names with proof constants: axl, ax2, refl, etc. The leftmost branch of 
the proof consists of the axiom refl = x = x, with x instantiated by 0. This is 
reflected by the proof term 

7Ti = refl 0. 

The middle branch introduces a meta-implication in the proof term 

7T2 = {XH : (/y X : N.x -I- 0 = 0 -I- a;). '!/'), 

where ip represents the body of the proof term (omitted here). The proof vari- 
able H represents an arbitrary proof of the proposition f\ x : N.x -I- 0 = 0 -I- x 
and is introduced by proof term A-abstraction. We can refer to a proof of this 
proposition in the proof term ip by the proof variable H . The whole proof term 
for (2) becomes 

7T = ind (Am. u + Q = Q + u) x • tti ■ tt 2 - 

The premises tti and 7T2, which correspond to the base case and the induc- 
tion step, are applied to the induction rule ind, reflecting elimination of meta- 
implication. In contrast to proof level A-abstraction, term level A-abstraction 
allows the higher-order variable p in ind to be instantiated with Xu. n-l-O = 0 -I-m. 
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3.2 Implementing Abstraction by Proof Term Transformations 

The abstractions presented in Sect. 2 are implemented as functions which take 
theorems to theorems by transforming proof terms. 

In proof terms, assumptions are represented by proof constants correspond- 
ing to previously proved theorems. For example, the proof term tt above contains 
proof constants axl, ax2 and ind, which can now be lifted to applicability condi- 
tions as described in Sect. 2.1. This is done by adding a proof term A-abstraction 
outside the proof term and replacing occurrences of the theorem inside the proof 
term with an appropriate variable. After abstraction over axl and ax2 (omitting 
the lengthy but similar abstraction over ind), we obtain the proof term 

4> = : (/\ X y : N.x Sy = S{x + y)).\H' : x : N.x -I- 0 = x).'K[axl / H' , ax2 /H] 

which can be replayed to yield the following theorem: 

|Vx, y ■ x + Sy = S'(a; -I- y); Vx • x -I- 0 = x] x -I- 0 = 0 -I- x 

Internally, deBruijn indices are used for bound variables, which explains the 
occurrence of H' in the second proof term. This gives a first simple version 
of the theorem abstraction function: traverse the proof tree, replace all nodes 
referring to the theorem we want to abstract over with the appropriate deBruijn 
index, and add a A-abstraction in front of the proof term. 

When we use a theorem in a proof, both schematic and type variables are 
instantiated. If we make the theorem an applicability condition we need to quan- 
tify over both the schematic and type variables, hence the meta-quantification 
in H and H' above. However, abstraction over type variables is not possible in 
the Hindley-Milner type system of Isabelle’s meta-logic, where type variables are 
always implicitly quantified at the outermost level. Instead, distinct assumptions 
must be provided for each type instance. For example, a proof of the theorem 

map {f • g) X = map f (map g x), (4) 

contains three different type instances of the definition of map for non-empty 
lists map f {Cons x y) = Cons (/ x) {map f y). 

At the implementation level, abstracting operations (Sect. 2.2) and types 
(Sect. 2.3) is more straightforward. Traversing the proof term we replace opera- 
tions and types by schematic and type variables, respectively. When abstracting 
over polymorphic operations, we need distinct variables for each type instance of 
the operation symbol, similar to the theorems above. If we consider map in The- 
orem (4), we need to abstract over each of the three type instances separately, 
resulting in three different function variables. 



3.3 Abstraction over Theories 

The previously defined elementary abstraction functions operate on single theo- 
rems, operations, and types. For a more high-level approach, abstraction tactics 
may be defined, which combine series of elementary abstraction steps. 
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An example of such a tactic is abstraction over theories. A theory in Isabelle 
can be thought of as a signature defining type constructors and operations, and 
a collection of theorems. Theories are organised hierarchically, so all theorems 
established in ancestor theories remain valid. 

The tactic abstracts a theorem which belongs to a theory T\ into an ancestor 
theory T 2 . It collects all theorems, operations, and types from the proof term 
which do not occur in T 2 , and applies elementary tactics recursively to abstract 
over each, starting with theorems and continuing with function symbols and 
types. Finally, the derived proof term is replayed in the ancestor theory, thus 
establishing the validity of the abstracted theorem in the theory T 2 ^ . 

Abstraction over all theorems, function symbols, and types will generally lead 
to theorems which are hard to reuse. In the next section, we will consider tactics 
which aid in the abstraction and reuse of abstracted theorems. 

4 Reapplying Abstracted Theorems 

This section considers different examples of abstraction, and scenarios to reap- 
ply abstracted theorems. As part of our experimentation with the abstraction 
method, we have generalised approximately 200 theorems from Isabelle’s li- 
braries, and reapplied these. A systematic approach to reapplication is suggested 
in order to facilitate reuse of sets of theorems. 



4.1 Simple Abstraction and Reuse 

A simple example of abstraction and reuse is to derive a theorem about natural 
numbers by abstraction from the theorem append_Nil2 = a:@ [] = x about lists. 
Applying the abstraction tactic abs_to_thy described in Sect. 3.3, we derive a 
theorem independent of the theory of lists: 

|VP 1. |P nil ; Wal. P I P {cons a I)] P 1; 

Vy. app nily = y; 

Vm X y. app {cons ux) y = cons u {app x y)] ^ 

app X nil = x 

The abstraction process introduces new names for the constant [] and the infix 
operator @, favouring lexically suitable variable names. 

We can now use this theorem to show that x + Q = x. We proceed in two 
stages: we first instantiate the variables nil with 0, app with -|- and cons with 
Xx.Suc (note we need the vacuous argument x here). This yields 

|VP 1. |P 0; \/al. P l=^P {Sue /)! P 1; 

\/y. y + 0 = y; 

Vt6 X y. Sue X + y = Suc{x -f y) ] x + Q = x. 

^ Due to Isabelle’s typeclasses an operation which is defined in T may not occur in 
the signature of T directly; in this case, the user has to explicitly give the operation. 
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The premises correspond to well-known theorems about natural numbers (in- 
duction and the definition of -h). Resolving with these, we obtain the theorem 
X + Q = X. Apart from the small simplification step required by moving from a 
parametric to a non-parametric type, this process can be completely automated. 

4.2 Change of Data Representation 

A more challenging situation occurs when we want to implement a datatype by 
another one. For example, suppose we implement the unary representation of 
natural numbers by a binary representation, which may be given as follows: 

datatype bNat = datatype Pos = 

Zero One 

I PBin Pos I Bit Pos bool 

The standard functions on Nat may be defined by means of bit operations in 
the binary number representation. We first define the successor functions bSucc 
on bNat and pSucc on Pos by primitive recursion. The latter is defined as 

pSucc One = Bit One False 
pSucc (Bit X b) = if b then Bit (pSucc x) False 
else Bit x True 

Subsequently, we define binary addition bPlus by primitive recursion by 
bPlus Zero x = x 

bPlus (Pbin x) y = (case y of Zero Pbin x 

I Pbin y Pbin (pPlus x y)) 



For Pos, we get: 

pPlus One y = pSucc y 
pPlus (Bit xbl) y = 

(case y of One pSucc (Bit xbl) 

I (Bit z b2) ^ Bit (pPlus x (if (bl h b2) then pSucc z 

else z)) 

(bl b2) 

We show how to prove bPlus x Zero = x by reusing the abstracted form (5) 
of theorem append_Nil2 (xs @ Nil = xs). We instantiate, this time nil with 
Zero, app with bPlus and cons with Xx.bSucc, and obtain the theorem 

|VP I-IP Zero-, \! a 1. P I => P (bSucc /)] ^ P t, 

\/x. bPlus Zero x = x~, (6) 

Vm X y. (bSucc x) y = bSucc (bPlus x ?/)] bPlus x Zero = x 

The first premise corresponds to the induction scheme, and the second and third 
premises correspond to the primitive recursive definition of addition on natu- 
ral numbers. The induction principle on Pos is given by the structure of the 
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datatype, not by natural induction. For the first premise, we therefore need to 
show that the usual natural induction rule can be derived for bNat. This is done 
by first establishing an isomorphism between Nat and bNat, i.e. two functions 
n2h : nat — > bJVat and h2n : bNat — > nat which are shown to be mutually in- 
verse. The second premise is given by the definition of bPlus. The third premise 
can be proved through case analysis on x and y. 

We next show how to prove bPlus x (bSuccy) = bSucc (bPlus x y) by reusing 
the proof of Vm n. m + Sue n = Sue (m + n) from the theory of natural numbers. 
The abstraction tactic gives us the theorem: 

I VP n. |P zero; Vn. P n P {sue n)] P n; 

Vn. plus zero n = n; 

Vm n. plus {sue u) n = sue {plus mm)] 

plus m {sue n) = sue {plus m n) 

Instantiation (zero with Zero, plus with bPlus, sue with bSuee) yields a theorem 
with three premises, which are identical to the premises of (5), except that 
there are no vacuous quantified variables in the first and third premise. Hence, 
resolution with the theorems needed above directly proves the goal. 

4.3 Moving Theorems Along Signature Morphisms 

In the previous examples, the process of moving theorems from the theory Nat to 
bNat is quite mechanical: take a theorem from Nat, abstract all operations and 
types from Nat, then instantiate the resulting variables with the corresponding 
operations from bNat. In general, we can move theorems from a source theory to 
a target theory if there is a suitable mapping of types and operations between the 
theories. Such mappings between types and operations are known as signature 
morphisms. 

A signature S = (T, 17) is given by type constructors T, with arity arx ■ 
T —>■ N, and operations 17, with arity ar^ : f2 —> T*. T* is the set of all well- 
formed types built from the type constructors and a (finitely countable) set of 
type variables. A signature morphism is a map between type constructors and 
operations preserving the arities of the type constructors, and the domain and 
range of the operations. Formally, given two signatures Si = (Ti, l7i) and S 2 = 
(T 2 , 172 )) a signature morphism a : Si ^ S 2 is given by a map ctt : Ti — > T 2 on 
type constructors and a map afi : Hi fl 2 on operation symbols, such that 

Vt G Ti. arTi(r) = arT^{aT{T)) (7) 

Vw G 17i. ctt(w) = (7q{uj) (8) 

where ctt : T^ ^ T 2 is the unique extension of ctt to all well-formed types. A 
partial signature morphism is given by partial maps ut '■ Ti ^ T 2 and an : f7i ^ 
172 such that all type constructors appearing in the source of any operation in 
the domain oi an are in the domain of ut- 

Let Thyj^ and Thy 2 be Isabelle theories with signatures T'(Thyj^) and T'(Thy 2 ), 
and let a : T'(Thyj^) — > T'(Thy 2 ) be a signature morphism. Any proof term from 
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Thyj^ can be translated into a proof term in Thy 2 if the proof does not con- 
tain references to theorems from Thyj^. This gives us a canonical way of moving 
theorems from Thyj^ to Thy 2 : first abstract all theorems from Thy^^ occuring in 
the proof of the theorem, then replace type constructors t with and all 

operation symbols oj with and replay the proof. Conditions (7) and (8) 

ensure that the translated proof term is well-typed. 

In order to extend the implementation of the theorem reuse method with 
mappings of this kind, we define an abstract ML type sig_morph for partial sig- 
nature morphisms. Signature morphisms are obtained using a constructor which 
checks the conditions (7) and (8) above, when given source and target signatures 
and the function graphs. We can apply the signature morphism in order to map 
types and terms from the source into the target signature. The invariants of the 
signature morphisms make sure that a translated term typechecks if the original 
term did. Given this type, we can define an abstraction tactic 

val abs_trcLnslate : sig_morph-> thm-> thm 

which moves a theorem along a signature morphism. Applying this abstraction 
tactic to our example, we can move any theorem from Nat to bNat, such as the 
theorems add_0_right, add_Suc_right (see Sect 4.2), and add_commute: 

I VP n. I P Zero] Vn. P n P {hSucc n) ] P n; 

Vn. bPlus Zero n = n; Vm. bPlus m Zero = m] 

Vu n. bPlus {bSucc u) n = bSucc {bPlus u n); 

Vm n. bPlus m {bSucc n) = bSucc {bPlus m n) ] 
bPlus m n = bPlus n m 

In the translated theorems, the applicability conditions correspond to the- 
orems that were used in the proof of the source theorems. This suggests that 
we can partially automate discharge of the applicability conditions when moving 
several theorems from a source theory to a target theory by considering the order 
in which the theorems are established in the source theory. 



4.4 Analysing Theorem Dependencies 

This section considers how to reduce proof work when moving theorems between 
theories. In the previous examples, we have seen that it was necessary to prove 
certain applicability conditions in the derived theorems. Some applicability con- 
ditions occur in several theorems (e.g. the induction rule above) and a derived 
theorem may occur as an applicability condition in another. Proof of applicabil- 
ity conditions may be considerably simplified by analysis of the source theory 
prior to theorem reuse. 

In the example of Nat and bNat, successor and addition for bNat were defined 
in terms of bit operations. This resulted in applicability conditions to ensure that 
the definition of addition in Nat was valid in bNat. In general, we would like to 
identify an appropriate, small set of theorems that need manual proof in a target 
theory in order to move a larger selection of theorems from the source theory to 
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the target theory automatically. We shall call such a set of theorems an axiomatic 
base. Finding an axiomatic base is a process which is hard to automate; for the 
Nat example above, the axiomatic base is the Peano axioms and the definition of 
addition. We will below give an algorithm which checks if a given set of theorems 
form an axiomatic base, and if so provides an abstraction tactic to move across 
theorems automatically. Isabelle’s visualisation tool for the dependency graph 
may help determine an appropriate axiomatic base. 

We say that a theorem ip depends on another theorem ip, written ip — > (p, 
if Ip occurs as a leaf node in the proof of p>. The premises premipp) of a theorem 
ip is the set of all theorems on which the theorem depends, i.e. prem{ip) = 
{ip I Ip — !■ 1 ^}. This allows the construction of a dependency graph for a theory, 
in which the nodes are theorems of the theory, and the (unlabelled) edges are 
Ip — > p for theorems ip and p. The dependency graph of the source theory helps 
to identify an appropriate axiomatic base. 

Given a set <P of theorems, let pre{<P) denote the preconditions of <P, i.e. the 
set of all theorems needed to derive the theorems in <P. This set can be obtained 
from the dependency graph by a simple depth- or breadth-first search: 

pre{<P) = ^ U pre{{p \ p — > ip for ip G ^}). 

A theorem ip is directly derivable from a set W of theorems, written ^ \- ip, ii all 
its premises (in the theory) are contained in !?': 

^ \- ip prem{ip) C 

This means that if we have translated all theorems in S', we can establish the 
translation of ip by replaying its translated proof term. The set of theorems deriv- 
able by proof replay from a set of theorems (p is the closure under derivability: 

der{<P) = <P\J der{{p \ <P h p}). 

Given a source theory with a set <P of theorems and an axiomatic base B, 
a target theory, and a partial signature morphism between the theories, we can 
systematically abstract all theorems in the set 

A = pre{<P) \ pre{B) 

and instantiate according to the signature morphism, deriving theorems in the 
target theory. A necessary condition for the success of this translation is that 
A C der{B), i.e. the axiomatic base is strong enough to derive the theorems in <P. 
In order to move theorems to another theory, the theorems of the axiomatic base, 
translated according to the signature morphism, must be proved in the target 
theory. In the example of Sect. 4.2, if (p includes add_commute, the axiomatic 
base will typically include the Peano axioms for addition, but need not include 
add_0_right, which is derivable from these axioms. If the theorems of <P are 
translated in the order of dependency, such that the premises premipp) are moved 
before ip for all ip G A, the applicability conditions of the derived theorems in 
the target theory can be discharged automatically. 
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Our implementation provides a module which implements the necessary graph 
algorithms. For simplicity and speed, we refer to theorems by their name through- 
out. In particular, the function 

val saturate ; DG -> string list-> string list 

-> (string* string list) list 

will, given a dependency graph, an axiomatic base and a list of theorems, provide 
a list of pairs of theorems and the names of their premises in order of dependency. 
This list can be given to the abstraction tactic which moves across the theorems. 

5 Related Work 

The problem of proof reuse has been addressed previously. Some approaches 
apply branches or fragments from old proofs when solving new problems: Melis 
and Whittle [14] study reasoning by analogy, a technique for reusing problem 
solving experience by proof planning; Giunchiglia, Villefiorita, and Walsh [7] 
study abstraction techniques, where a problem is translated into a related ab- 
stract problem which should be easier to prove as irrelevant details are ignored; 
and Walther and Kolbe, in their Plagiator system [26], suggest proof reuse 
in first-order equational theories by so-called proof catches, a subset of the leaf 
nodes in a proof tree, similar to our applicability conditions. 

The KIV system reuses proof fragments to reprove old theorems after mod- 
ifications to an initial program [22] . The approach exploits a correspondence 
between positions in a program text and in the proofs, so that subtrees of the 
original proof tree can be moved to new positions. This depends on the un- 
derlying proof rules, so the approach is targeted towards syntax-driven proof 
methods typical of program verification. A more semantic approach are develop- 
ment graphs as implemented in Maya [3] , where a specification is represented by 
a development graph, a richer version of the dependency graphs from Sect. 4.4. 

In a logical framework setting, Felty and Howe [6] describe a generic approach 
to generalisation and reuse of tactic proofs. In their work, a proof is a nested 
series of proof steps which may have open branches. Reuse is achieved by replac- 
ing the substitutions of a proof with substitutions derived from a different proof 
goal by means of higher-order resolution. This opens for an elegant way to reuse 
steps from abortive proof attempts for e.g. unfortunate variable instantiations, 
which can to some extent be mimicked by considering different unifiers for our 
derived inference rules. In contrast to the cited works, our approach allows a 
generalisation over types as well as function symbols. In particular, proof reuse 
as in the examples of Sect. 4 is not feasible in the cited approaches. 

Proof reuse and generalisation of theorems have been studied in the Coq 
system. Proofs in Coq resemble proof terms in Isabelle, but in a richer type the- 
ory. Pons, Magaud and Bertot [13,20] consider transformation of proofs, similar 
to ours, replacing operation symbols and types by variables, and Magaud and 
Bertot [13] consider change of data representation in this setting, studying in 
particular the same example as in Sect. 4.2 (in fact, our example was inspired by 
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this work), extended further to type-specific inference rules in [12]. The richer 
type theory used by Coq makes proof term manipulation more involved than 
in the logical framework setting of our approach. For reuse, proofs have to be 
massaged into a particular form, and e.g. induction and case distinction have to 
be given special treatment. There are particular methods which either generalise 
theorems [13,20], or handle change of data type representations [12,13]. In our 
approach induction and case distinction are represented as meta-logic axioms, 
which allows a uniform treatment of these situations by appropriate abstraction 
tactics. For example, the dependency analysis (Sect. 4.4) is built on top of the 
more elementary abstraction and reuse tactics. Further, theorems abstracted 
with our method may be instantiated several times in different settings, thus 
allowing multiple reuse. 

Dependency graphs and similar structures have been considered in systems 
such as KIV or Maya [3] . Isabelle can visualise the dependency graph, but not in 
an interactive way. More interesting here is the work by Bertot, Pons and Pot- 
tier [5], who implemented an interactive visualisation of the dependency graph, 
allowing manipulations such as removing and grouping of edges and labels. An 
interactive tool in this vein would greatly aid the user in establishing an ax- 
iomatic base. 

6 Conclusion and Future Work 

This paper demonstrates how theorems can be generalised from a given setting 
and reapplied in another, exploiting the possibilities offered by proof terms in 
logical framework style theorem provers. This approach combines proof term 
manipulation as known from type theory with the flexibility and power of log- 
ical frameworks. The combination is particularly well suited for changing data 
representations because object logic inference rules and theorems may be given a 
uniform treatment in both the abstraction and reuse process. Consequently, the 
transformation method may be applied to any theorem in a direct way, allowing 
multiple reuse of the abstracted theorem in different settings. 

The considered strategies for reuse point in interesting directions. Signature 
morphisms are used as a structuring mechanism in algebraic specification lan- 
guages such as CASL [2], and for structured development in e.g. Maya [3] or 
Specware [23,25]. The proposed analysis of theorem dependencies is promising, 
and should be supported by a (probably graphical) tool, which would allow the 
user to interactively determine an axiomatic base for the theory, assisted by 
appropriate heuristics. 

In addition to theorem reuse as discussed in this paper, the proposed method 
may have applications in formal program development. In this field, several ap- 
proaches have been suggested based on specialised transformation rules [23, 24] 
or deriving rules from theorems [1,11[. However a coherent framework is lacking, 
allowing users to systematically generalise existing developments to a widely ap- 
plicable set of transformation rules. The proposed abstraction method may be 
of use here. A small demonstration of this application, deriving transformation 
rules from correctness proofs of data refinements, may be found in [10] . 
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The suggested proof term transformations and reuse strategies have been 
implemented in Isabelle 2003^. The implementation comprises only about one 
thousand lines of ML code, with the abstraction tactics accounting for roughly 
40%, dependency analysis and signature morphisms about 30%, and auxiliaries 
and utilities the rest. The compactness of the code suggests that the framework 
of meta-logic proof terms provided by Isabelle is well-suited for this kind of 
transformations. At a technical level, there are several ways of improving the 
proposed abstraction method. An interesting improvement is to incorporate the 
technique of coloured terms [9], which would allow several variables to replace 
the same function symbol during abstraction. 
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Abstract. Previously, we have found that incremental refinement, us- 
ing the event based B method, supports the effective management of 
proof complexity when verifying properties of lO protocols, such as PCI. 
In the context of 10 protocols, another important verification task is to 
show that a new protocol can be made compatible with an existing pro- 
tocol. In this article, we report on our efforts to show that RapidIO, a 
new lO standard, can be made compatible with the transaction ordering 
property expected by PCI, a legacy standard. This is done by creating a 
refinement sequence for RapidIO as a branch in our previously completed 
PCI rehnement chain. We found that incremental refinement simplifies 
specification reuse and allows a precise statement of compatibility. On- 
going work seeks to identify proof engineering methods to increase proof 
reuse for compatibility proofs. 



1 Introduction 

The introduction of a new 10 standard, such as RapidIO, requires a great deal 
of design and verification effort to maintain adequate compatibility with legacy 
standards while improving performance. In this paper, we report on our ef- 
forts to show compatibility between RapidIO [RapOI] and PCI [PCI95] using 
incremental refinement from a common specification. This work builds on our 
previous efforts to refine PCI from a specification of its transaction ordering 
property [WJM+02]. 

In simplest terms, refinement is a way of showing that a machine implements 
a specification. Compatibility is the problem of showing that two machines sat- 
isfy the same specification. One way to show compatibility is to create a branch 
in the refinement chain. Incremental refinement supports such an approach by 
enforcing a rigid theory structure in which it is easy to identify where to create 
the branch. The resulting proof shows precisely at what level of abstraction and 
for which properties two machines are compatible. Both of these results would 
be more difficult to obtain in a general purpose theorem prover - without using 
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the same theory structure. Since refinement tools already support such a theory 
structure, they are a natural choice for showing compatibility. 

The primary purpose of this work was to investigate the suitability of using 
the B method [Abr96] (as implemented in AtelierB [Cle02]) to show compatibil- 
ity between the PCI and RapidIO protocols for the PCI transaction ordering 
property. We reused a refinement chain showing that PCI implements the prop- 
erty and created a new chain for RapidIO. The RapidIO chain includes 4 new 
abstract machines and required completing 952 non-obvious proof obligations. 
Of which, about 700 simply required the correct invocation of AtelierB decision 
procedures. The sequence of RapidIO models identifies the subset of RapidIO 
network topologies and message types for which RapidIO can be used to imple- 
ment PCI transaction ordering. 

While one case study does not a methodology make, we found that incre- 
mental refinement is an excellent method for reusing formal specifications when 
showing compatibility between 10 standards. Specifically, we found that refine- 
ment imposes a useful structure on the refinement proofs and refinement sup- 
ports precise statements about the level of abstraction at which compatibility 
is proven. More generally, refinement can be useful when several implementa- 
tions must be shown to satisfy the same complex specification. This problem 
frequently arises in industrial microprocessor and system on a chip design. 

We had hoped to get more proof reuse in addition to specification reuse. 
However, the structure of the first refinement chain dictated that the RapidIO 
chain should begin at a rather high level of abstraction. Since most of the hard 
proof work appeared lower in the chain, we obtained little proof reuse. 

The next section explains event based refinement in the B method. Section 3 
describes the problem of showing PCI/RapidIO compatibility in detail and in- 
troduces the approach we took to making a subset of RapidIO compatible using 
an adaptor. Section 4 describes a solution to the compatibility problem and the 
models and proofs used in the formalization of the solution are presented in Sec- 
tion 5. We discuss proof metrics and make a comparison with higher-order logic 
theorem proving in Section 6. We offer conclusions and future work in Section 7. 

2 Event Based Refinement and AtelierB 

The goal in the event based B method is to show that successively more com- 
plex concrete systems refine an abstract system. We do this incrementally to 
distribute the difficulty of the entire sequence. In this approach, a system con- 
sists of state variables, invariants and events. The invariants describe properties 
that must be true of the state variables and the events describe transitions be- 
tween values of state variables. 

In particular, an event has the form 

any x where P{x,v) then v := E(x,v) (1) 



for local state x and global state v. The predicate P is called the guard and the 
relation E is the update relation. For this event, the global state v is updated to 
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E{x,v) if there exists an x such that P{x,v) holds. The event in Equation 1 is 
said to refined by an event 

any y where Q{y,w) then w := F{y,w) 
if the following proof obligation can be discharged: 

I{v) A J{v, w) A Q{y, w) 3x.P{x, v) A J{E{x, v),F{y, w)) 
for invariant I and gluing invariant J. 

In every system there is a special event called the skip event. The skip event 
has no guard and does not change any state variables. An event refines the skip 
event if it also does not change any state variables. An event that refines the 
skip event is called a silent event. 

Given two systems, A and C, the abstract system A is refined by a more 
concrete system C, C Gs A, if the following three conditions are satisfied: 

— Correctness. Every event in C refines either an event in A or the skip event 
in A. Otherwise, an event in C could perform a behavior not duplicated by 

A. 

— Progress. A non-silent event in C is always eventually enabled. Otherwise, C 
might continuously perform silent events and appear to do nothing relative 
to the behavior of A. 

— Relative deadlock freedom. At least one guard in C is satisfied whenever 
at least one guard in A is satisfied. Otherwise, C would introduce more 
deadlocks than A. 

AtelierB is an implementation of the B method. AtelierB consists of three 
main parts: a specification language, a proof obligation generator and an interac- 
tive proof assistant. The specification language encodes first order logic extended 
with set theory. While the specification language is not based on higher-order 
logic per se, it is capable of expressing similar higher-order concepts using quan- 
tification over the subsets of a set [ACL02] . The proof assistant includes a “pred- 
icate prover” (PP) which is a decision procedure for the predicate calculus. A 
more detailed discussion of the predicate prover and the similarities between 
AtelierB and HOT proof assistants can be found in [AC03]. 

Given an abstract system and a machine, the AtelierB proof tool generates 
the proof obligations required to show correctness for only the correctness part of 
system refinement. The progress condition can be met using a measure function 
in silent events of the concrete machine. The relative deadlock freedom condition 
can be met by proving that the conjunction of the guards in the abstract match- 
ing imply the conjunction of the guards in the concrete machine under the gluing 
invariant. In this case study, we have only proved the correctness condition for 
every refinement step. 

3 PCI/RapidIO Compatibility Problem 

Figure 1 shows the system that can be constructed after completing a compat- 
ibility proof. Since RapidIO maintains the p/c property, a set of PGI terminals 
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Fig. 1. Connecting PCI devices across a RapidIO network while preserving PCI trans- 
action ordering properties. 



can use a RapidIO network (possibly through an adaptor, as described shortly) 
to communicate while preserving PCI translation ordering. The motivation for 
the solving the compatibility problem is to show that PCI devices can be con- 
nected across a RapidIO network while maintaining PCI transaction ordering. 
This is not possible for arbitrary RapidIO networks because RapidIO includes 
behaviors that violate PCI transaction ordering. More precisely, the problem 
is to show how certain RapidIO networks can be used to connect PCI devices 
(through an adaptor) while preserving PCI transaction ordering. Transaction 
ordering for PCI devices is defined by the producer/consumer (p/c) property. 

The relationship needed to implement the system shown in Figure 1 can be 
stated more precisely in terms of system refinement. The definition is somewhat 
idealized to precisely capture the our notion of compatibility. The idealization 
presupposes the existence of concrete formal models that define each protocol. 
Since both protocols are defined with English prose, no such model exists for 
either protocol. We use VCI and TZIO to denote the mathematically definitions 
of PCI and RapidIO. For the p/c property, we say that TZIO with adaptor A is 
compatible with 'PCI for p/c at level M iff: 



VCIQs 

TZI0\JA Es 



M Es p/c 



We have proven this property for models of PCI and RapidIO that are more ab- 
stract than 'PCI and TZIO, but concrete enough to include a significant amount 
of detail. Since the actual descriptions of RapidIO and PCI consist of imprecise 
English prose and figures, completing refinement down to TZIO and T^CI is not 
possible. 

The remaining parts of this section describe the PCI and RapidIO protocols 
and describe several reasons why the protocols are incompatible. 



3.1 PCI 

The PCI standard defines two kinds of messages: posted and delayed. Posted 
messages are unacknowledged writes that may not be dropped during transit. 
Delayed messages are acknowledged reads or writes that may be dropped during 
transit before they are committed. A delayed transaction is committed when 
an attempt has been made to send it across an electrical bus. Messages are 
sent and received by agents and stored in intermediate bridges. A bridge sits 
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between two electrical busses and contains two queues, one facing either bus. 
Messages may pass each other in queues under the condition that no message 
may pass a posted message. The passing rules are designed to avoid deadlock 
while preserving end-to-end message ordering rules. 

The end-to-end message ordering rules are stated in terms of a distributed 
p/c property in Appendix A of the PCI specification. The p/c property states 
that: 

1. if an agent, the producer, issues a write to a data address followed by a 
write to a flag address, where the flag and data addresses may be located at 
different agents, and 

2. another agent, the consumer, issues a read to the flag address, and 

3. the consumer issues a read to the data address after receiving the results of 
the flag read, and 

4. the producer’s flag write had arrived before the consumer’s flag read, then 

5. the consumer’s data read always returns the value written by the producer, 
assuming no other values were written to the data address in the interim. 

This property is particularly interesting for two reasons. First, the producer’s 
writes and the consumer’s reads can be in transit in the network at the same 
time and may pass each other in accordance with the passing rules. Second, the 
published PCI standard was intended to satisfy this property, but does not. Our 
previous refinement case study demonstrated that a corrected version of the PCI 
standard does satisfy this property [WJM“*'02]. 

3.2 RapidIO 

The RapidIO standard defines a high-speed serial interconnect designed to re- 
place slower bus-based interconnects such as PCI. RapidIO messages consist of 
requests and replies. RapidIO networks consist of agents connected by a network 
of switches. RapidIO switches contain four queues and may be arranged in any 
topology. Messages must be routed through the switches in a way that avoids 
cyclic topology based deadlocks. 

Both requests and replies are issued within a priority flow. A priority flow is 
defined by the priority, source and destination of a message. Switches maintain 
message ordering within a priority flow but not between priority flows. There are 
four priority classes in RapidIO. Requests are issued in the lowest three classes 
while replies may be issued in any class. A reply to a request at level i may be 
issued at the next highest level i+1 to prevent deadlock. RapidIO switches may 
not drop messages but endpoints may ignore messages and issue retry requests. 
Retry requests are returned to the sender and cause the sender to resend the 
discarded message. 

3.3 Why They Don’t Work Together 

Because RapidIO passing rules and topologies are more relaxed than PCI, the 
two protocols are not immediately compatible. There are seven specific problems 
we identified in our efforts to make them compatible. 
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1. RapidIO uses switches instead of a single shared bridge as in PCI. This 
allows messages to take separate paths to a common destination. 

2. RapidIO switches cannot know what types of messages are being sent, but 
PCI bridges have to so that they can control message reordering. This means 
that the RapidIO switches cannot be in control of message ordering based 
on message type. 

3. RapidIO uses priority classes to enforce message ordering. This could allow 
p/c dataread messages to pass datawrite messages if they are not sent with 
the correct priority. 

4. Messages are ordered in transaction flows (flows originated at the same 
source). This allows dataread to pass datawrite because they originate at 
different sources. 

5. Priority is only used to order messages that enter a RapidIO switch at a 
common port and at targeted to a common destination port. This could 
allow the dataread to pass the datawrite even if they are sent at the same 
priority but enter a switch at different ports. 

6. Retries in RapidIO propagate back to the originating device because switches 
do not store retry information. PCI devices do not retry messages because 
PCI bridges handle retrying packets. 

7. When a message is retried in RapidIO, all further messages are silently dis- 
carded as it is assumed that the sending device will resend the retried mes- 
sage and all subsequent messages until it gets back to the point at which it 
received the retried message. 

Since it is evident that RapidIO and PCI are not completely compatible, 
the central challenge in showing their compatibility is showing which RapidIO 
networks and messages can be used to satisfy PCI ordering rules. 

4 Solving the Compatibility Problem 

There are two requirements a RapidIO network must meet in order to guarantee 
the message ordering necessary to maintain the p/c property. The first is that 
the datawrite, flagwrite and dataread messages all be sent with the same pri- 
ority. The second is that there must be at least two consecutive switches that 
the datawrite, flagwrite and dataread messages pass through on the way to the 
flag and data devices; and that the datawrite and the flagwrite are mutually 
constrained from the producer to the second common switch. By mutually con- 
strained, we mean that the path they take to the second common switch must 
be the same. The path itself is unconstrained. 

These requirements, along with the message ordering rules of RapidIO, guar- 
antee the message ordering behavior needed to ensure the p/c property. Forcing 
the dataread to pass through the same two consecutive switches as the datawrite 
causes the RapidIO switch ordering rules to take effect. RapidIO guarantees con- 
sistent path selection from any switch to a single destination. The dataread and 
the datawrite are both destined for the data device, so from that second common 
switch they are guaranteed to follow the same path. 
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From the second common switch to the data device, the datawrite and the 
dataread will always enter the switches at the same port and leave the switches at 
the same port. If they have the same priority then the dataread cannot pass the 
datawrite as long as the dataread enters the second common switch behind the 
datawrite. The flagwrite cannot pass the datawrite before the second common 
switch because they are mutually constrained up to the second common switch 
and they are required to be sent at the same priority. The dataread cannot 
pass the flagwrite because it cannot be sent until the flagread has completed. 
Other cases are unimportant because the p/c property is only required when the 
flagread does not pass the flagwrite. 

Unfortunately, the constraints discussed so far do not solve the problem 
caused by the different retry mechanisms in PCI and RapidlO. We solve this 
problem using a PCI to RapidlO adapter. An adapter is required anyway to 
translate the messages between the two protocols. We add additional function- 
ality to the adapter to facilitate retries and still maintain the p/c property. 

We do not explicitly identify the implementation specific details of the adapter. 
We only identify general characteristics the adapter must have. First, at the 
sending end, the adapter must have a method of queuing sent messages until 
they have posted at their destination so that the adapter can retry the packets if 
necessary. Second, the adapter must guarantee that the messages are delivered 
to the PCI device in the order that they were received at the adapter. Third, 
if the adapter retries a write, it must at least block all reads to that address, 
possibly by queuing those reads or by retrying those reads, until it receives 
and accepts the retried write. A strength of refinement is that we can use an 
abstract description of an adapter in the proof. Any implementation that refines 
this model will preserve correctness. 



5 Models and Proofs 

The goal of our research is to show that compatibility between 10 standards can 
be demonstrated by using incremental refinement from a common specification. 
We examined the PCI models previously used in [WJM+02], and determined 
that the best point at which to make the break in the refinement chain was after 
the third PCI model. The first three PCI models describe a subset of RapidlO 
behavior, but the fourth PCI model introduces PCI-specific topology and bridge 
behavior, making it unsuitable as an abstract description of RapidlO. Once the 
decision was made to branch our refinement after the third PCI model, we were 
presented with the question of exactly how to model RapidlO. The first three PCI 
models will be described here in order to provide background for understanding 
the RapidlO models, which will be discussed in more detail in following sections. 



5.1 Abstraction of the Producer/ Consumer Property 

The first model abstracts the p/c property and introduces the basic variables 
used in the refinement. A set, DATA, is created, as well as two constants, AD 
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and VAL, which are defined to be subsets of data, where AD n VAL = 0. 
The variables introduced in the model are prod and cons, which are defined as 
partial functions from AD to VAL; (e.g. prod € AD VAL) The invariant 
of the model requires that cons C prod. There are only two events in the first 
model, produce and consume. They are defined as follows: 

produce = any ad, data where 

ad € AD A ad ^ DOM (prod) A data G VAL 

then 

prod : = prod U {ad data} 

end; 

con.sume = any ad where 

ad € AD A ad G DOM(prod) A ad ^ DOM(cons) 

then 

cons := cons U {ad i— > prod{ad)} 

end; 

This model is easily understood and describes correct p/c behavior. The 
produce event takes an address to which data has not been written, and assigns 
some data to it. The consume event then takes any address which has had data 
written to it, and adds that address to the set of addresses that have been read. 
This prevents a Consumer device from reading the data until the Producer device 
has written that data. 

5.2 Adding the Flag and Data Devices 

The second PCI model adds variables and events to describe the Flag and Data 
devices. In this model, the Producer, Consumer, data device and fiag device 
agents are distributed but communication is “magic” in the sense that exact 
mechanism for communication is not specified. The new variables are mem and 
flag, which are defined as partial functions from AD to VAL. In addition to the 
requirement of the first model, the invariant states: 
cons C flag C mem C prod 

So, data can only be consumed if it has been produced, written to memory 
and had a fiag set. 

There are two new events added to this model, which refine the silent, or skip, 
events of the first PCI model. These events send information between devices. 
The datawrite event takes messages that have been produced (where {ad 
VAL} G prod), and adds them into a set called mem. The second new event, 
flagwrite, takes messages which are in the mem set and adds them to a set, flag. 
While the produce event remains unchanged in this model, the consume event is 
changed to require that ad G DOM(flag) in its guard, but is otherwise the same. 

5.3 Message Transactions 

In the next model, actual message transactions are described. The primary de- 
tails added in this model are split reads in which the sending of a read message 
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and the return of the data are separated. Six new sets are used, dw for datawrite 
transactions, dr for dataread transactions, cdr for dataread completion messages, 
fw for flagwrite messages, fr for flagread messages, and c/r for flagread comple- 
tion messages. Each of these new sets is defined as a subset of the addresses 
set (e.g. dw C AD). So, if an address is in one of the preceding sets, it can 
be easily determined which message has been written for that specific address. 
For example, if an address is in the dr set, then the dataread message has been 
received by the Data device for that address, and if the same address is not in 
the cdr set, then the dataread completion message has not yet been received by 
the Consumer device. 

The following statements are added to the invariant: 

dw = DOM(mem) A fw = DOM{flag) A fr C fw A cfr C fr O dw A 
dr C dw A dr C cfr A cdr C dr A DOM(cons) C cdr 

The p/c property in this particular model is expressed as: fr Q fw A dr C 
dw, meaning that the flag for a specific address can only be read if the flag has 
been written, and that the data for a specific address can only be read if the 
data has been written for that address. 

The new events are flagread, compflagread, dataread, and compdataread. The 
guard of the eonsume event is changed to require that ad S cdr, and the flagwrite 
and datawrite events add the address to their respective sets. Specifically, the 
line dw := dw U {ad} is added to the substitution of datawrite and the line 
fw := fw U {ad} is added to the substitution of flagwrite. The new events are: 

flagread = any ad where 

ad G AD A ad ^ fr A ad G fw 

then 

fr := fr U [ad) 

end; 

compflagread = any ad where 

ad G AD A ad ^ cfr A ad G fr 

then 

cfr := cfr U jadj 

end; 

dataread = any ad where 

ad G AD A ad G cfr A ad ^ dr 

then 

cdr := cdr U {ad} 

end; 

compdataread = any ad where 

ad G AD A ad G dr A ad ^ cdr 

then 

cdr := cdr U {ad} 

end; 
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5.4 Modeling RapidIO Switches and Topology 

The introduction of PCI-specific topology constraints and bridge behavior in 
the fourth PCImodel required the branch to the RapidIO refinement chain to 
be made after the third PCI model. It seemed natural, therefore, to include the 
necessary RapidIO topology constraints in our first RapidIO model. 

As explained in previous sections, we modeled RapidIO topology by including 
two switches. Because the ordering rules within the RapidIO switches are based 
on ports within the switch, each switch is sufficiently represented by sets for its 
input and output ports. The behavior inside a switch is omitted in this model, 
but added later. 




Fig. 2. Each device is eventually connected to the two switches in this manner. The 
dotted lines represent paths of switch sequences. 



The first switch has two input ports and one output port. The second switch 
has one input port (connected to the first switch’s output port) and two output 
ports. Each input and output port of each switch are described using a different 
function. The connections between switch ports and the names of the corre- 
sponding port functions are shown in Figure 2. The function, slil represents 
the first input port of the first switch, slout represents the output port of the 
first switch, s2in represents the input port of the second switch, and s2ol rep- 
resents the first output port of the second switch. Our additional p/c topology 
constraints require that the Producer device connect to the RapidIO network 
through sli2, that the Consumer device connect through slil, and the Data and 
Flag devices through s2ol and s2o2, respectively. 

We define a set of states: 

STATES = {CDR, FW, DR, DW, FR, CFR} 

where each element of STATES is an element of the DATA set (e.g. CDR G 
DATA), and each element of STATES is distinct. 
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Each switch function is defined as a partial injection from the natural num- 
bers to addresses paired with data (e.g. slzl G N h-> ADxSTATES). A counter 
for each switch (for example, cslin and cslout are the counters for messages en- 
tering and exiting the first switch) is assigned to each message when the message 
passes through the port, and the counter is incremented after the message passes 
through. The port function maps the counter to the message. The counters are 
used to determine the order in which messages have entered the switch. 

Informally the datawrite event takes an address which has not been written, 
and adds it to the slil set paired with its associated DW data as the image 
of the counter. The event also adds the address to the set dw, and increments 
the input counter of the first switch. The DWoutl event takes an address-DW 
pair which has been put into the slil set, but is not in slout, and adds it to 
the si out, s2in and dwl sets, while incrementing the output counter of the first 
switch and the input counter of the second switch. The DWoutE event takes an 
address-DW pair which is in the s2in set, but not yet in the s2ol set, adds it 
to the s2ol and the dw2 sets, and increments the output counter of the second 
switch. 

The following sequence of events describe how a datawrite message moves 
through the system. First, a new address ad is selected which has not been used 
previously. A new address is unused if there are no datawrite messages with 
that address in the memory, mem, or the input of switch one, slil. Having 
chosen such an address, a message is created which pairs the address with the 
value produced for that address prod{ad). The address value pair is given a 
timestamp cslm. The timestamp marks the order in which messages arrived at 
a given switch. 

datawrite = any ad where 

ad € AD A ad G DOM(prod) A ad ^ DOM{mem) A 
{ad 1-^ DW) ^ RAN(slil) A ad ^ dw A 
cslin ^ DOM(slil) 

then 

mem := mem U {ad i— *■ prod{ad)} |j 
slil := sill U {cslin i-^ {ad DW)} || 
dw := dw U {ad} || 
cslin := cslin+1 

end; 

Next, the DWoutl event describes the movement of a datawrite message 
from the output of switch 1, slout, to the input of switch 2, s2in. The third 
conjunct in the precondition ensures that more messages have arrived at switch 
1 than have been output by switch 1. While this property is intuitively obvious, 
it ensures that switch 1 does not forward messages before receiving them. The 
remaining conjuncts ensure that the message is not forwarded more than once. 
If the guard is satisfied, DWoutl moves the message into the output of switch 
1 and into the input of switch 2. Timestamps are incremented and associated 
with the message in each port. 
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DW outl = any ad where 

ad G AD A ad G DOM{prod) A cslout < cslin A 
cslout ^ DOM(sloui) A cs2in ^ DOM(s2in) A 
{ad I— > DW) G RAN(slil) A {ad i— > DW) ^ RAN(sloitt) A 
ad G dw A ad ^ dwl A 

then 

dwl := dwl U {ad} || 

slout := slout U {cslout (ad DW)} |j 
cslout := cslout+1 II 
sling := slin U {cs2m (ad DW)} |j 
cs2in := cs2m+l 

end; 

Finally, the DW out2 event moves the message from the output port of switch 
2 to the input port of the data device memory device, dw2. The guard and action 
of this event are similar to the guard on the DW outl event shown above 
DW out2 = any ad where 

ad G AD A ad G DOM(prod) A (ad DW) G RAN(s2m) A 
(ad I— > DW) ^ ran(s2o1) A cs2out ^ dom(s2o1) 
ad G dwl A ad ^ dw2 A cs2out < cs2in A 

then 

cs2out := cs2out+l |j 

s2ol := s2ol U {cs2out i— > (ad i— > DW)} || 

dw2 := dw2 U {ad} 

end; 

The dataread and flagwrite messages are modeled similarly, except that the 
dataread address-DR pair is initially added to the sli2 set, and the flagwrite 
address-FW pair is finally added to the s2o2 set. Each message passes through 
both switches, except for the flagread messages, which are not required to pass 
through the switches to maintain p/c. 

The guard of compdataread is modified to require that ad G dr2. This guar- 
antees that the dataread message has passed through the second switch before 
the completion can be sent and received by the Consumer device. 

Since the flagread messages are not required to pass through the switches to 
guarantee the p/c property, the behavior of the flagread messages maintain the 
same level of abstraction that they have in the third PCI model through the rest 
of the refinement. 

The invariant requires datawrite and flagwrite messages to have passed through 
an input port before they pass through an output port. The invariant also states 
that a flagwrite message cannot enter the second switch before its corresponding 
datawrite message. 

5.5 Priority and Reordering 

The second RapidIO refinement models priority, and allows the switches to re- 
order messages based on priority, as described in Section 3.2. An event assigns 
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each message a priority of either 0, 1, or 2, by assigning the message to a function 
which maps address and message pairs to the set {0,1,2}. The function will 
never assign an address and message pair to 3, as 3 is reserved in the RapidIO 
specification for message retries. 

The events that model messages leaving switches (DWoutl, DWout2, FWoutl, 
etc) are guarded by statements like the following, (from the guard of the DWoutl 
event): 

DWoutl = ANYac? where 

V {msg).{msg G AD x DATA A msg G ran{slil) A 
msg ^ ran(slout) A msg ^ ran{s2in) A 
slil~^{msg) < slil~^{ad i— > DW)) 

prior(msg) < prior{ad i— > DW) 

(other conditions omitted for clarity) 

then 

dwl := dwl U {ad} || 

slout := slout U {cslout i— > {ad ^ DW)} || 
cslout := cslout+1 II 
sling := slin U {cs2m {ad DW)} || 
cs2in := cs2in+l 

end; 

This particular statement requires that there be no other message with higher 
priority waiting in the queue before a datawrite message can be passed on into 
the next queue. This allows messages with higher priority to bypass messages 
with lower priority. If the guard is satisfied, the event moves the message as 
before. 



5.6 Interfacing PCI Devices Across a RapidIO Network 

The final step in our refinement was to model the way in which our PCI-to- 
RapidlO adapters would work in the event of a retry being issued for a datawrite 
message. In PCI, if a message cannot reach a device, the bridge will resend the 
message until the device is able to accept it. In RapidIO if a device cannot ac- 
cept a message, it drops the packet, and sends a retry message to the device 
where the message originated. The danger of this characteristic of the RapidIO 
specification, with respect to the p/c property, is that, if a datawrite message 
is dropped, then there is no guarantee that the corresponding dataread message 
will not be accepted by the data device before the retried datawrite message. 
Our solution is to create a set which holds all datawrite and dataread messages 
between the second switch and the data device. The dataread message is not 
passed from this intermediate set, and into the data device, before its corre- 
sponding datawrite message. In the event of a retry, the datawrite message waits 
in the adapter set, until it completes, and then the dataread message is free to 
pass through to the data device. 
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The new sets are adl, ad2, retries, resent, dwS and drS. The adapters are 
represented by adl and ad2, retries is the set of all addresses for which datawrite 
messages have been retried, and resent is the set of all addresses for which the 
retried messages have been successfully resent. The sets dwS and dr 3 are the sets 
of all addresses for which datawrite and dataread messages have passed through 
the adapter. Four events {DWout3, DRout3, retry, and resend) are added to this 
model. 

Note that the flagwrite messages are not modeled to pass through the adapter. 
This is because a retry on a flagwrite message will not violate the p/c property. 
The p/c property is only guaranteed if the Consumer device is able to read the 
flag set by the Producer. 

The DWout2 and DRout2 events are changed to add the address-data pair to 
the ad2 set, in addition to the substitutions of the previous model. The DWout3 
event adds a message to the dw3 set, and is guarded by the statement, 
DWoutS = ANY ad where 

~^{ad € retries A ad ^ resent) 

(other conditions omitted for clarity) 
then 

dw3 := dwHU{ad} 

end; 

which means that if a message has been retried, then it must have been success- 
fully resent before the adapter can send it to the Data device. The DRout3 adds 
a message to the dr 3 set only if ad S dw3, or if the corresponding data write 
message has been successfully received by the Data device. 

The other two new events are retry and reset. The retry event takes an 
address which is in the ad2 set as an address-DW pair, and adds the address to 
the retries set. The resend event takes an arbitrary address (using the choice 
operator) from the retries set and adds it to the resent set. Since the actual 
mechanism of resending messages does not affect the p/c property, our resend 
event does not necessitate any refinement. Invariants describing the retries and 
resent sets are added to this model. 

6 Proof Metrics and Comparison 

The refinement of RapidIO from the transaction ordering property was com- 
pleted by two researchers over a five month period. The proof involved a total 
of 6133 obvious (solved by AtelierB’s predicate prover) and 952 non-obvious 
(required interactive proof) proof obligations. Of these proof obligations, only 
736 obvious and 59 non-obvious proof obligations were reused without modi- 
fications from the previous PCI proof. In this project, proving refinement for 
each increasingly concrete models uniformly required discharging more proof 
obligations. More than half of the non-obvious obligations were discharged using 
AtelierB’s automated decision procedures based predicate calculus. A decision 
procedures based on counter logic with uninterpreted functions (CLU) may have 
solved more proof obligations [BLS02]. CLU is particularly well-suited for this 
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problem domain because models of queues and buffers often include simple arith- 
metic to denote message ordering. 

The five months included time for both researchers to learn to use AtelierB. 
The forced structure of event based modeling and incremental refinement re- 
quired the researchers to engineer their models and proofs to simplify proof 
obligations and break complex refinements into simpler steps. In contrast, three 
expert users failed to complete a similar proof for PCI using a higher-order logic 
theorem prover after a combined 18 months of effort [MHJGOO]. 

Although the general purpose higher-order logic theorem prover did provide a 
superior interactive proof environment due to better decision procedure support 
and tactical programming support, it did not provide the same rigid structure 
in which to engineer models to manage complexity.. Instead, the refinement was 
done in one step and the resulting proofs were extremely difficult. Of course, the 
construction of a theory of event-based refinement in a general purpose higher- 
order logic theorem prover would combine the rigid proof structure with decision 
procedures and tactical programming support. We did not find any compelling 
reason to favor set-based (as in AtelierB) or type-based specification languages 
(as in an HOT tool). Both were adequate for this problem. 

7 Conclusion 

When describing the results of a formal verification project, it is important to 
precisely state what was proved, at what level of abstraction and using which 
proof technique^. Using refinement to show compatibility allows a precise state- 
ment of what was proved and a well-defined level of abstraction at which it was 
proved. More specifically, the statement of compatibility is the behavior of the 
most abstract common ancestor and the level of abstraction is least abstract 
common successor. 

In this case study, the statement of compatibility is the first abstract model 
in which the producer creates unique data values and the consumer reads them. 
The level of abstraction is the third model in which the producer writes the data 
to a data location, sets a flag at a third location and the consumer reads the flag 
then reads the data. 

Our use of refinement to show compatibility is similar to Mery and Cansell’s 
use of refinement to study feature interaction [CMOl]. When studying feature 
interaction, the refinement chain is split into two branches, called horizontal and 
vertical refinement, but is rejoined at the next concrete level. When showing 
compatibility, we also split the refinement into two branches, one for each pro- 
tocol, but do not rejoin them at the next concrete level. This is because we do 
not allow both protocols to have arbitrary interactions in the same system. Our 
lowest level RapidIO does allow PCI devices to be connected at the periphery of 
the network, but does not allow a heterogeneous network of PCI and RapidIO 
interconnects. 

^ A point clearly articulated by Cohn [Coh89] in the context of hardware verihcation, 
but which applies more generally. 
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We had hoped to reuse more proofs between the two refinement sequences. 
Instead, non-compatible details used early in the PCI refinement chain prevented 
their reuse in the RapidIO refinement chain. One possible method of increasing 
proof re-use would be to explore methods of pushing complexity to the earlier 
models of the first refinement. If the preliminary models of the original refinement 
chain were complex enough to require more difficult proofs, then the refinement 
of the second system would share those difficult proofs, and reduce total effort 
involved in the process. The difficulty will be to balance complexity with a level 
of abstraction that describes both specifications. 
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Abstract. In this paper, we show that one can “deep-embed” the Java 
bytecode language, a fairly complicated language with a rich semantics, 
into the first order logic of ACL2 by modeling a realistic JVM. We show 
that with proper support from a semi-automatic theorem prover in that 
logic, one can reason about the correctness of Java programs. This rea- 
soning can be done in a direct and intuitive way without incurring the 
extra burden that has often been associated with hand proofs, or proofs 
that make use of less automated proof assistance. We present proofs for 
two simple Java programs as a showcase. 



1 Introduction 

In order to reason about software/hardware artifacts mathematically, we need 
to represent the artifacts as mathematical objects. We often formalize them by 
assigning a precise semantics to the underlying language constructs or hardware 
primitives. 

In cases where there exists an axiomatic semantics for the language, we can 
reason about the artifact directly using axioms and specialized derivation rules. 
A typical example is Hoare logic [7]. 

However, such an approach makes it hard to use existing general purpose the- 
orem provers such as ACL2 [13] and PVS [3], because for each different logical 
system, a new computer aided reasoning engine must be constructed. Construct- 
ing a specialized theorem prover comparable to current mature general purpose 
ones is often time consuming and error prone. Generic theorem proving environ- 
ments such as Isabelle [18] prove to be useful in this setting, because Isabelle 
can be configured to function as a specialized theorem prover for different for- 
malisms. Alternatively, if we can embed the language into the formalism of a 
powerful general purpose theorem prover, we can use that theorem prover for 
program verification projects. We think that this approach is also practical. 

There are two common choices for formalizing a program artifact in the logic 
of a theorem prover. In a “shallow embedding” one describes a process by which a 
conjecture about a given program may be converted to an “equivalent” formula. 
Neither the programs (the original forms in the old syntax) nor the process are 
defined within the logic - they are meta-level entities. In a “deep embedding”. 
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programs and their environments are logical objects that are related by func- 
tions and relations formally defined within the logic. Usually, the syntax of the 
original programs is preserved. The semantics of the basic language constructs 
are formalized instead of the semantics of specific programs. 

Each approach has its pros and cons. Shallow embedding requires less logical 
infrastructure and often produces simpler conjectures to prove. Deep embedding, 
however, allows one to reason formally not just about a given program but about 
relations between programs and properties of the semantics itself. For example, 
deep embedding a program with a semantics for the underlying programming 
language allow the user to reason about properties shared by a set of programs. 
Deep embedding also allows the user to derive new proof rules as theorems. 
However, much logical manipulation must occur to wade through the details of 
the semantics. Automation is highly desirable and brings new capabilities, such 
as simulation, symbolic evaluation, and other analysis tools. 

It is this paper’s thesis that Java program verification via a deep embedding 
of the JVM into the logic of ACL2 is a viable approach. In fact, we believe that 
with the proper support from a powerful semi-automatic theorem prover, the 
deep embedding approach is better than the shallow embedding approach in the 
sense that it brings more assurance of the verification result without incurring 
much extra burden. 

In section 2, we present our deep embedding of a full featured JVM into 
ACL2. The executability of ACL2 models allows one to use such a complete 
deep embedding as a JVM simulator. In the section 3, we present correctness 
proofs for two simple Java programs to demonstrate the approach and illustrate 
some useful techniques in handling a deep embedding. In section 4, we review 
other work and comment on the proof effort required by our method and explain 
briefly the limitations of our work. We summarize and conclude in section 5. 

2 Deep Embedding a JVM in ACL2 

We wrote a precise model of the JVM in ACL2 to formally capture the meaning 
of Java bytecode programs. The JVM model is based on the JVM specification. 
We follow the KVM, a C implementation of the JVM, as a reference model in 
our “implementation” ^ . 

ACL2 is an applicative (side-effect free) subset of Common Lisp. Our JVM 
model can be executed as a Lisp program. It is implemented with around ten 
thousand lines of Lisp (ACL2) in about 25 modules. It implements most features 
of a JVM such as dynamic class loading, thread synchronization via monitors, 
together with 21 out of the 41 native methods defined in Java 2 Microeditions’s 
CLDC library [22]. The features that are missing are the “reflection” capability 
in the full JVM, user defined class loaders, floating point arithmetic, and native 
methods related to some I/O operations. 

Realistic Java programs can execute on the model. We expect to run a suit- 
able subset of some conformance test suite at some point. The details of the 

^ In the process, we discovered several implementation errors in the KVM. Some were 
already known to Sun. Some are forwarded to the KVM development team. 
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model are described in the paper [12], which we presented in the workshop of 
Interpreter, Virtual Machine and Emulator 2003, affiliated with PLDP. 

2.1 Motivation to Embed a JVM 

We are interested in applying theorem proving techniques to software verifica- 
tions projects. In particular, we are interested in reasoning about the properties 
of the Java virtual machine and Java software executing on the JVM. 

This is one of the reasons that we decided to deep-embed the Java bytecode 
language via a JVM model. The other reason is that we feel more confident 
in our ability to formalize the semantics of the bytecode language of the Java 
Virtual Machine than our ability to correctly assign meanings to specific Java 
programs or the Java programming language. 

Like most imperative programming languages, the semantics of Java are hard 
to formalize directly. The object oriented features such as method overriding, 
dynamic method resolution, access permissions, and constructs such as inner 
classes present significant challenges. 

As expressed in our position paper [11], the JVM bytecode is simpler and 
more precisely defined than Java. We therefore define the semantics of the byte- 
code language with an operational JVM interpreter. We reason about Java pro- 
grams by reasoning about the corresponding bytecode program via j avac on the 
JVM model. This approach was demonstrated by Yu [1] using the predecessor 
of ACL2, Nqthm to reason about C via gcc and a model of the Motorola 68020. 

Because the model is formally defined in the logic, we can also reason about 
it independent of the consideration of any particular program. This allows us to 
derive new proof rules from the semantics, as well as to explore the implications 
of semantics, i.e. properties of the JVM itself. Both activities increase our under- 
standing of and confidence in the semantics; and both activities are supported by 
machine-checked reasoning rather than informal reasoning. Finally, the bytecode 
analyzed is more closely related to what is actually executed than the original 
Java. In summary, we regard deep embedding as offering higher assurance than 
shallow embedding. 

2.2 The JVM Model in ACL2 

The completeness of our JVM embedding determines the range of Java programs 
that we can reason about as well as the relevance of our formal statements 
about the Java programs. Our model is fairly complete - it is a realistic JVM 
simulator that executes most Java programs that do not use I/O nor floating 
point operations. 

Since ACL2 is applicative, we have to model the JVM state explicitly. All 
aspects of the machine state are encoded explicitly in one logical object denoted 
by a term. A JVM state in this model is a seven-tuple consisting of a global 

^ A revised version was accepted for publication in a special issue of the journal “Sci- 
ence of Computer Programming” for IVME’03. 
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program counter, a current thread register, a heap, a thread table, an internal 
class table that records the runtime representations of the loaded classes, an 
environment that represents the source from which classes are to be loaded, and 
a fatal error flag used by the interpreter to indicate an unrecoverable error. 

The thread table is a table containing one entry per thread. Each entry has 
a slot for a saved copy of the global program counter, which points to the next 
instruction to be executed the next time this thread is scheduled. Among other 
things, the entry also records the method invocation stack (or “call stack”) of the 
thread. The call stack is a stack of frames. Each frame specifies the method being 
executed, a return pc, a list of local variables, an operand stack, and possibly a 
reference to a Java object on which this invocation is synchronized. 

The heap is a map from addresses to instance objects. The internal class 
table is a map from class names to descriptions of various aspects of each class, 
including its direct superclass, implemented interfaces, fields, methods, access 
fiags, and the byte code for each method. 

All of this state information is represented as a single Lisp object composed of 
lists, symbols, strings, and numbers. Operations on state components, including 
determination of the next instruction, object creation, and method resolution, 
are all defined as Lisp functions on these Lisp objects. 

As a concrete example of how a piece of state is represented, the following 
entry is taken from an actual thread table when we used our model to exe- 
cute a multi-threaded program for computing factorial. A semicolon (;) begins 
a comment extending to the end of the line. 



(THREAD 0 
(SAVED-PC . 0) 

(CALL-STACK 

(FRAME (RETURN_PC . 7) 

(OPERAND-STACK) 

(LOCALS 104) 

(METHOD-PTR "FactHelper" "<init>" ...) 
(SYNC-OBJ-REF . -1)) 

(FRAME (RETURN_PC . 18) 



thread id is 0 
slot for saved pc 

pc to return to 
empty operand stack 



(METHOD-PTR "FactHelper" "compute"...) 
(SYNC-OBJ-REF . 



. . .) 

(STATUS THREAD_ACT1VE) 
(MONITOR . -1) 

(MDEPTH . 0) 

(THREAD-OBJ . 55)) 



-D) 

; thread state 
; lock 

; entering count 
; object rep in heap 



Each thread table entry has slots for recording a thread id, a pc, a call stack, 
a thread state, a reference to the monitor, the number of times the thread has 
entered the monitor, and a reference to the Java object representing the thread 
in the heap. 
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The semantics of the JVM instructions are modeled operationally as state 
transition functions. Here is the state transition function for the IDIV instruction. 

(defun execute-IDIV (inst s) 

(let ( (v2 (topStack s)) 

(vl (secondStack s))) 

(if (equal v2 0) 

(raise-exception "java.lang.ArithmeticException" s) 
(advance-pc 

(pushStack (int-fix (truncate vl v2)) 

(popStack (popStack s))))))) 

Here, inst is understood to be a parsed IDlV instruction. Advance-pc is a 
Lisp macro to advance the global program counter by the size of the instruction. 
PushStack pushes a value on the operand stack of the current frame (the top 
call frame of the current thread) and returns the resulting state. When the 
item on the top of the operand stack of the current frame is zero, the output 
of execute-lDlV is a state obtained from s by raising an exception of type 
java. lang. ArithmeticException. If the top item is not zero, the resulting state 
is obtained by changing the operand stack in the current frame and advancing 
the program counter. The operand stack is changed by pushing a certain value 
(described below) onto the result of popping two items off the initial operand 
stack. The value pushed is the twos-complement integer represented by the low- 
order 32-bits of the integer quotient of the second item on the initial operand 
stack divided by the first item on it. In ACL2, the function truncate returns 
an integer quotient rounded toward 0. 

The top level interpreter loop is modeled as following: 

(defun run (sched s) 

(if (endp sched) s ; end of schedule 

(let ((nid (car sched)) ; else 
(cid (current-thread s))) 

(if (equal cid nid) 

(run (cdr sched) (step s)) ; execute one step 
(run (cdr sched) 

(loadExecutionEnvironment 

nid ; proper thread context switch 

(storeExecutionEnvironment s) )))))))) 

Our JVM model takes a “schedule” (a list of thread ids) and a state as the 
input and repeatedly executes the next instruction from the thread as indicated 
in the schedule, until the schedule is exhausted. 

The scheduling policy is thus left unspecified. Any schedule can be simulated. 
However to use the model as an execution engine without providing a schedule 
list explicitly, we have implemented some simple scheduling policies. One of them 
is a not- very-realistic round-robin scheduling algorithm, which does a reschedul- 
ing after executing each bytecode instruction. 
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Before concluding this section, we observe that the defun of run (and of each 
of the other functions shown above) can be thought of in either of two ways. First, 
it defines a side-effect free Lisp program which can be executed on concrete 
data. Second, it introduces a new logical definitional equation which can be 
used to prove theorems about the newly defined function symbol. Preserving the 
view that we are “merely” defining an executable model often provides valuable 
clarity. Executing the model often provides assistance in the search for true 
statements about programs and in the search for proofs. In some sense, the 
“embedding” is so direct that it is transparent, i.e. we are reasoning about the 
JVM directly. 

3 Java Program Verification 

With our choice of a deep embedding of the Java bytecode language, reason- 
ing about any Java bytecode program implies that we need to deal with the 
complexity of the JVM in addition to the program itself. The task seems to be 
formidable. This additional complexity is considered one of the major drawbacks 
of the deep embedding approach. 

We acknowledge that deep embedding adds extra complexity in the verifi- 
cation of programs. But if one can accomplish the program verification task at 
this level, we believe that additional confidence is gained. 

The central remaining question is whether one can reduce the “extra” com- 
plexity to an acceptable level. It is our experience with the JVM and ACL2 
that one can achieve this reduction by configuring the rewriting engine of ACL2 
using lemma libraries. Such configuration needs to be done only once for a class 
of programs. 

In this section, we present proofs of two simple programs to show how we 
manage the complexity in ACL2. We show the proof for the first program in 
some detail and refer readers to the actual proof scripts for comments and other 
details in the supporting material [6]. 



3.1 ADDl Program 

The first program is trivial, 
public class First { 

public static void mainCString [] args) { 
int i=l; 
int j=i+l; 

i=j: 

return; }} 

The main method is straight line code that only modifies the operand stack 
and local variables in the current call frame, i.e. the top most activation record 
from the call stack of the current thread. With this example, we illustrate how 
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we can reason about programs and segments of programs which only manipulate 
the current call frame. 

Our tool jvm2acl2 transforms the First, class into the following format, 
which directly corresponds to the class file format [23]. 

’(class "First" ; class name is First 

"java. lang. Object" ; Superclass is java. lang. Object 

(fields) ; list of field definitions 

(methods ; list of method definitions 

(method "<init>" 

. . . .) 

(method "main" ; method name. 

(parameters (array (class "java. lang. String"))) 
(returntype void) 

(accessflags *class* ^public* ^static* ) 
(code 

(max_stack 2) ... 

(parsedcode 

(0 (iconst_l)) ;; *Note: (0 (iload_2)) 

(1 (istore_l)) 

(2 (iload_D) 

(3 (iconst_l)) 

(4 (iadd)) 

(5 (istore_2)) 

(6 (iload_2)) 

(7 (istore_l)) 

(8 (return)) 

(endofcode 9)) 

(Exceptions ) 

(StackMap ))))) 

. . . .) 

This logical constant represents the First. class file. A list of such class 
constants together with the JVM interpreter gives the semantics of the original 
Java program. For this program, the semantics of the main method only depends 
on the JVM interpreter and this particular class itself; for more complicated 
programs, the meaning of a user-defined class often depends on other classes. 

To make the example slightly more interesting, we change by hand the first 
instruction, (0 (iconst_l) ) , to (0 (iload_2) )^. We prove that by starting in 
a state where the pc is 0 and executing 7 steps according to a round robin 
scheduling algorithm, we produce a state in which the value in the second slot 
of the locals is increased by one from its original value. We describe what is 
essential to configure ACL2 in deriving this. 

® In fact, this makes the class file fail to pass bytecode verification. Here we are trying 
to make the proof a little bit more interesting by proving an assertion in form of 
Vi,P(i). 
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The first step is to identify the appropriate abstractions of JVM executions 
and formalize those concepts properly. For example, consider the intuitive un- 
derstanding of what the “next instruction” is. In our JVM model, such a concept 
is complex because the state is complex. The next instruction of a given state is 
the instruction that resides at a certain offset within the bytecode of the current 
method, where the offset is given as the value of the pc field of the state; and 
the current method is identified by consulting the current class table using the 
method identifier in the activation record of the current thread. One must also 
consider special conditions, such as when the current thread does not exist or 
has been stopped by another thread. Such complexity is reduced by defining a 
named function of state, next-inst, and using it consistently within the model 
so that the above details are not exposed. We regard this as just good model- 
ing practice. We typically configure ACL2 so as not to expand the definitions 
of these abstractions (“disabling” the associated rules in ACL2’s database). We 
will rely only on a set of properties of these operations on “states of interest” , 
which we prove before we disable the definition. 

The reason that the intuitive informal notion of “next instruction” appears 
simpler is probably because the user evaluates it only on symbolic states for 
which next-inst returns constants. That is, when considering the verification 
of a particular program in thread 0 informally, we do not contemplate whether 
there can be a context switch to a thread, or whether the current activation 
record corresponds to the program of interest. 

In the second step, we formalize the concepts that capture the identified 
domain, i.e. “states of interest”. We prove that in the identified domain, compli- 
cated primitive operations have the simple behavior as expected. To formalize 
this we introduce an equivalence relation on states, equiv-state, that means, 
roughly, “the states are executing the same program.” We are more precise below. 
ACL2’s rewrite engine can use arbitrary equivalences and congruence lemmas 
(which establish that certain functions cannot distinguish “equivalent” input) to 
descend through the subterms of a term and replace occurrences of target terms 
by equivalent terms. 

To cause the next instruction concept to expand only on the states of interest 
we prove the following lemma and then disable the definition of next-inst 

(defthm equiv-state-init-state-next-inst 
(implies (equiv-state s (init-state) ) 

(equal (next-inst s) 

(inst-by-of f set (pc s) (theMethod) ) ) ) ) 

The theorem asserts that for any state running the program of interest (that 
in the constant (init-state) ), the next instruction can be computed by looking 
at a certain offset of the program of interest. This is a trivial theorem to prove. 
On states equivalent to (init-state) the body of next-inst can be reduced 
to a constant, namely, the next instruction. Thus, by proving this lemma and 
disabling next-inst, ACL2 will reduce (next-inst s) to a constant instruction 
if s is running the program of interest, but will not change the next-inst term 
otherwise. 
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In order to use the just established theorem to rewrite (next-inst s) into 
a simpler form, where s is a (round-robin-run s n) term, we need to reason 
about the run function th a round robin scheduler. In particular, we need to prove 
that there is no context switch as the program steps from one instruction to the 
next. To prove there is no context switch, we proved three types of theorems 
around the equivalence relation that we identified: 

— A congruence on the equivalence relation equiv-state, which asserts that 
the round robin scheduler always picks the same thread if two states are 
equiv-state. 

(def thm round-robin-schedule-equal-in-equiv-state 
(implies (equiv-state s s-equiv) 

(equal (round-robin-schedule s-equiv) 
(round-robin-schedule s))) 

: rule-classes : congruence) 

~ A theorem that states the properties of the initial state. In this case, the 
round robin scheduler picks the thread 0 to execute in the initial state. 

(def thm round-robin-schedule-init-state 

(equal (round-robin-schedule (init-state) ) 0)) 

— Theorems that state equivalence is preserved by executing each primitive, 
e.g., pushStack. 

(def thm pushStack-preserves-equiv-state 
(equiv-state (pushStack vs) s)) 

Having so configured ACL2 by proving these lemmas, JVM execution of straight 
line code can be expanded into a composition of primitives by ACL2 automati- 
cally. For example, 

(def thm round-robin-run-expansion-example 

(implies (and (equiv-state si (init-state)) 

(equal (pc si) 2)) 

(equiv-state (round-robin-run si 4) 

(init-state) ) ) ) 

is proved automatically. The theorem prover expands the (round-robin-run 
si 4) symbolically step by step using the rewrite rules derived from the proven 
theorems. 

In this example, starting from pc equals 2, (round-robin-run s 4) executes 
(iload_D), (iconst_l) , (iadd) , and ( ist or e_2) in sequence. Because ev- 
ery instruction is one byte. Executing 4 instructions shall result in a term of the 
following form, where pc is 6. 




Java Program Verification via a JVM Deep Embedding in ACL2 193 



(state-set-pc 6 
(popStack 

(state-set-local 2 (topStack ..) 
(state-set-pc 5 
(pushStack 

(int-fix (binary-+ ...)) 
(popStack (popStack 

(state-set-pc 4 



;# 

;# 

;# cf. 1STDRE_2 



; * 

; * 

; * 

;* cf. lADD 
. . ..)))))))) 
cf. ICONST.l 
; $ ILOAD.l 



Compare this expected form to one of the intermediate goals generated by ACL 2 

Subgoal 1 ’ 5 ’ 

(IMPLIES 

(AND (EQUIV-STATE SI (INIT-STATE) ) 



(EQUAL (PC SI) 2) ; pc = 

(EQUAL 0 (CURRENT-THREAD 
(EQUIV-STATE 
(POPSTACK 
(POPSTACK 
(STATE-SET-PC 
4 

(PUSHSTACK 1 
(STATE-SET-PC 3 

(PUSHSTACK (LOCAL-AT 1 
SI)))))) 

(INlT-STATE))) . 



2 in starting state 

T))) 



;* cf . partial lADD 

;7. cf. ICONST.l 

;$ 

(LOCALS (CURRENT-FRAME SI))) 
;$ cf. ILOAD.l 



In Subgoal 1 ’ 5 ’ , ACL2 has reduced 

(equiv-state (state-set-pc 6 (popStack (state-set-local 2 ...))) 
(init-state) ) 



into 

(equiv-state (popStack (popStack (state-set-pc 4 ...))) 
(init-state) ) 

after “peeling off” some outer primitives such as (state-set-pc 6 . . .). 

This shows that reasoning about (next-inst s) is entirely automatic. The 
theorem prover “knows” enough to determine the next instruction and then to 
execute it as a symbolic execution engine. A formula involving the round-robin- 
run is thus reduced to a composition of primitives, such a pushStack, popStack, 
state-set-pc. The structure of the composition of primitives can be traced back 
to the instructions in the original bytecode sequence. 

The third configuration step is to arrange for the theorem prover to reason 
about compositions of different primitives. This is closely related to the second 
step - identifying conditions under which the primitives behave according to our 
intuitions. 
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For example, we have an understanding of the effects of the push and pop 
operations on a stack. The following should obviously be true. 

(popStack (pushStack vs)) = s 

However the above is not so obvious in a Java program without some implicit 
hypothesis. PushStack pushes a value onto the operand stack of the topmost call 
frame of the current thread. For the above to be true, we need to explicitly show 
(or configure the theorem prover to automatically recognize), no other part of 
the state is changed. The similar problem manifests itself in other places such 
as showing that setting the program counter does not affect the operand stack. 
This is the pattern called the “frame” problem in AI research. To describe the 
effect of an operation, we not only need to be explicit about what is changed, 
but also be explicit about what does not change. 

Our current solution is built around equivalences and associated congruence 
rules. We identify what is not changing and introduce an equivalence that groups 
the states that share the unchanged part. We prove that primitive operations 
preserve those equivalences. We prove other properties of those equivalences in 
the form of congruence rules. 

In this ADDl program proof, we recognize that the program is straight line 
code that only modifies the operand stack and the locals. We defined the state 
equivalence to capture the following: if the only difference between two states 
are the operand stack and locals of their respective current frame, they are 
equivalent. 

This strategy has worked well. However we can foresee limitations in our 
approach. When dealing with more complicated operations such as the ones that 
manipulate the heap, we may face the need to define a hierarchy of equivalence 
relations to characterize differences between different operations. 

The following is the final theorem we proved about the ADDl program^. The 
current proof script in ACL2 is about 2000 lines with over 140 user typed lem- 
mas®. 

(defthm f irst-is-correct 

(let ((old (local-at 2 si))) 

(implies (and (equiv-state si (init-state) ) 
(current-thread-exists? si) 

(wf f-state-regular si) 

(wf f-thread-table-regular (thread-table si)) 

(wf f-call-frame-regular (current-frame si)) 
(unique-id-thread-table (thread-table si)) 

(equal (pc si) 0) 

(integerp old)) 

(equal (local-at 2 (round-robin-run si 7)) 

(int-fix (+ 1 old)))))) 



^ ACL2 has implicit universal quantifiers over all free variables appearing in a formula. 
® This proof script represents a first cut at the problem. It can be improved. The proof 
is available as part of the supporting material [6]. 
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The apparent complexity in the statement is partly inherent in the JVM spec- 
ification. Others result from our particular choice in implementation. Almost all 
efforts in this proof are devoted for defining a proper domain and configuring our 
theorem prover to reason about interactions between primitives in that domain. 

One can argue that we could have saved effort by reasoning about this pro- 
gram at a higher level. We agree with this view. However, the effort expended 
to configure ACL2 to reason about this simple program does not have to be 
repeated. We have developed an ACL2 “book” (a file containing definitions and 
lemmas) that codifies the necessary “concepts” and “knowledge”, and config- 
ures ACL2 to reason about straight-line programs automatically. We thus have 
high confidence in our semantics and can reason about it without difficulty. In 
fact, we have proved properties of a different piece of straight line program that 
computes (int-fix (+ 4 (* 2 old) )), with 100 lines®. 

3.2 Recursive Factorial Program 

In this section, we briefly discuss our experience with a second proof effort that 
reuses the definitions and lemmas developed in the ADDl program proof. The 
program computes the factorial of its input, or, to be more precise, it computes 
the signed integer representation of the low order 32-bits of the mathematical 
factorial. 

The program of interest is as follows 
(class "Second" 

(method "fact" 

(parameters int) 

(returntype int) 

(code 

(parsedcode 
(0 (iload_0)) 

(1 (ifgt 6)) ;;to TAG_0 
(4 (iconst_l)) 

(5 (ireturn)) 

(6 (iload_0)) ;;at TAG_0 
(7 (iload_0)) 

(8 (iconst_l)) 

(9 (isub)) 

(10 (invokestatic (methodCP "fact" "Second" (int) int))) 
(13 (imul)) 

(14 (ireturn)) 

. . ..)))) 



Details are available from the supporting materials. 
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This program is still very simple conceptually but much more complicated 
than the ADDl program. We proved the following theorem 

(defthm second-is-correct 

(implies (and (poised-f or-execute-f act s) 

(wf f-state-regular s) 

(wf f-thread-table-regular (thread-table s)) 
(no-fatal-error? s) 

(integerp n) 

(<= 0 n) 

(intp n) 

(equal n (topStack s))) 

(equal (simple-run s (fact-clock n)) 

(state-set-pc (+ 3 (pc s)) 

(pushStack (int-fix (fact n)) 
(popStack s)))))) 

The theorem may be read as follows. Let s be a state poised to invoke our fact 
method, i.e., whose next instruction is an invokestatic of fact. Suppose the 
state is in some suitable sense well-formed, that n is a nonnegative 32-bit inte- 
ger and that n is on top of the stack. Run s a certain number of steps, namely 
(fact-clock n). The result is a state that could be alternately obtained by 
incrementing the pc of s by 3 (the number of bytes in the invokestatic in- 
struction), popping the stack (to remove n), and pushing the int representation 
of (fact n). Here fact is defined in the logic as the standard mathematical 
factorial. 

What is new in the program is that it involves the method invocation that 
changes the call stack of the current thread. 

What may at first be surprising is that the second proof is much shorter than 
the proof about the ADDl program. One reason is because in the first proof we 
reasoned about a round robin scheduler. However, the more essential reason is 
that we reused our results from the first proof about straight line code. 

To explain, it is necessary to describe how ACL2 works. ACL2 is a semi- 
automatic theorem prover. The user submits definitions, and formulas that are 
asserted to be theorems. The system attempts to establish the legality of each 
definition and the validity of each alleged theorem. When a formula is proven to 
be a theorem it is converted into a rule and stored in the database. In most cases, 
a rewrite rule is generated. By submitting an appropriate sequence of lemmas 
the user can configure ACL2 to prove theorems with certain strategies. The 
sequence of interactions is called a session. The file containing the definitions 
and lemmas is called a “book.” Subsequent sessions may begin by including 
books taken “from the shelf.” The ACL2 distribution contains many standard 
books on arithmetic, sets, vectors, floating point, etc. Using the ADDl program 
as a “challenge” problem, we created a book that codifies how to reason about 
straight-line programs that modify only the operand stack and locals. 

To prove fact correct we start with the basic ADDl book (or just continue 
the ADDl session) and follow the same strategy. We introduce the abstraction 
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of pushFrame, popFraune; we introduce a new state equivalence that captures 
what does not change during a call stack manipulation; and we prove theorems 
to guide ACL2 reasoning about compositions of operand stack primitives with 
call stack primitives. 

The surprise in this proof effort is that the semantics of invoking a method 
in JVM (and Java) is rather rich. It involves dynamic class resolution, which in 
turn relies on primitives that load a class. Moreover, loading a class is related 
to creating objects dynamically in the heap. Thread synchronization and class 
initialization are also involved. We spend a major part of our efforts in reasoning 
about those primitives. In the final theorem, we assume s is a state in which 
the class is already loaded by asserting the starting state is “equivalent” to 
some constant state where the “Second” class is loaded. More details and some 
explanations are available in the supporting material [6] for this paper. 



4 Review and Related Work 

The challenge in using a deep embedding of a realistic programming language 
like JVM bytecode is managing the complexity at proof-time. We presented two 
proofs to show that the apparent complexity involved in the deep embedding can 
be alleviated by introducing the necessary abstractions and proving properties 
of those primitives in an identified domain. 

Identifying the appropriate abstractions is relatively simple. Most work in- 
volves correctly identifying the domain where the primitives behave according 
to the intuition of the user. Another major effort is establishing properties of 
the abstract primitives and configuring the theorem proving engine to use them 
(typically, but not exclusively) as rewrite rules. 

The main limitation of the current work is that we have not yet developed a 
good and concise set of primitives and their properties. Even though the proof of 
the ADDl program is automatic, it is quite long. On the other hand, our experience 
with the factorial program proof shows that even with the non-optimized set of 
lemmas, one can still benefit from the support of the computer aided reasoning 
tool. In developing the lemmas for the factorial program proof we do not have 
to think about how to reason about straight-line code - a problem solved once 
and for all in the ADDl proof. In the factorial proof we focus on the primitives 
that manipulate the call stack. 

We have not explored proofs about more complicated JVM operations in our 
model, such as allocation of new objects in the heap or synchronization using 
monitors. Programs using such primitives have been verified with ACL2 using a 
simpler JVM model that does not include our modeling of dynamic class loading 
and exceptions [16]. 

The sample proofs presented here are proofs for complete correctness. The 
lemma library about primitives can be reused for proving partial correctness of 
Java programs. In his CHARME 2003 paper [15], Moore shows that Floyd-Hoare 
style assertion based program proofs can be constructed directly from the formal 
operational semantics with little extra logical overhead, i.e. with no need to write 
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a verification condition generator or other meta- logical tools. To make effective 
use of the operational semantics in place of a conventional verification condition 
generator, ACL2 needs to be configured to simplify the compositions of JVM 
primitives. Thus the present work may be viewed as a follow-up to Moore’s 
work on Floyd-Hoare style proofs for bytecode programs on a very complete 
JVM model. 

In addition to using our model to verify properties of bytecode programs, 
we are using it to explore the correctness of the JVM bytecode verifier. In our 
approach, defining a realistic JVM is one of the necessary steps in that effort. 
This is an additional justification for the choice of a deep embedding: it allows 
us to state and prove “meta-level” properties. For existing works on formalizing 
bytecode verification, the special issue on Java bytecode verification from the 
Journal of Automated Reasoning is a good reference [17]. 

The collection of “Formal syntax and semantics of Java” edited by Alves- 
Foss contains many early works in formalizing the Java programming language 
[4]. The Java Language Specification [10] provides the informal specification. 
Although we feel it is hard to formalize a complex language by designing an 
axiomatic semantics, the LOOP project [21] has formalized the semantics of 
Java and a Java annotation language JML based on coalgehra. They are also 
deriving proof rules in the style of Hoare logic for embedding Java into PVS [8]. 

To us, a more feasible method is to give Java an operational semantics. In 
[9], Attali et.al. present an operational semantics for Java using the structural 
operational semantics approach [19]. We think that our operational semantics 
given by state transformation appeals to human intuition better than the opera- 
tional semantics based on structural transformation. This in turn makes it easier 
to validate the formal semantics against informal specifications and benchmark 
implementations. In addition, we feel that a structural operational semantics 
would be awkward to support in ACL2. 

Borger et. al use abstract state machines for modeling the dynamic semantics 
of Java [20]. This work seems close to our work at the JVM level. The work by 
T. Nipkow, et. al., on ^Java [24] and the Bali project [5] embeds a subset of Java 
and the Java bytecode language into the theorem prover Isabelle/HOL to reason 
about the type safety of those languages. Recently, J. Meseguer’s group in UIUC 
has used the rewriting logic and engine Maude [2] to formalize the semantics of 
Java and the JVM [14]. 

In contrast to the above efforts, our work presented in this paper is focused 
on modeling an executable JVM and reasoning about Java program via the 
direct and intuitive state transformation semantics of its corresponding bytecode 
program on the JVM model. 



5 Conclusion 

To use a general purpose theorem prover in formal program verification the 
semantics of a program to be verified must be expressed in the language of the 
theorem prover’s logic. 
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In this paper, we show that one can deeply embed the Java bytecode lan- 
guage, a fairly complicated language with rich semantics, into the first order 
logic of ACL2 by modeling a realistic JVM. We reason “about Java programs” 
by compiling them with Sun’s javac and then reasoning about the bytecode. 

We claim that this is a viable approach in doing Java program verification. 
One of the obvious advantages of deep embedding is that its operational nature 
makes the semantics correspond closely to informal descriptions in the JVM 
specification and with benchmark implementations, increasing one’s confidence 
in the model. The behavior of programs under the model and properties of 
the model are then derived by the theorem proving engine, increasing one’s 
confidence that the reasoning is sound. 

The central question is whether we can effectively deal with the complexity 
introduced by this approach. We show that with the support of a user guided, 
semi-automatic computer proof assistant, the user can reason about programs 
at a fairly intuitive level. In a system like ACL2 the necessary support can be 
arranged by defining appropriate abstractions and proving lemmas about them 
for automatic use by the system. We demonstrate this by covering two concrete 
proofs, with the later one reusing the results of the first one as a lemma library. 

We feel the limitation of the current work is that our lemma libraries for 
reasoning about Java programs are still unoptimized and only cover selected 
JVM primitives. Our current focus has been in formalizing the correctness of 
the Java bytecode verifier. We are looking forward to extending our work to 
provide a full-fledged Java verification system. 
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1 Introduction 

We consider the old problem of proving that a computer program meets some 
specification. By proving, we mean machine checked proof in some formal logic. 
The programming language we choose to work with is a call by value functional 
language, essentially the functional core of Standard ML (SML). In future work 
we hope to add exceptions, then references and I/O to the language. 

The full SML language has a formal definition in terms of a big-step op- 
erational semantics [MTHM97]. While such a definition may support formal 
reasoning about meta-theoretical properties of the language, it is too low-level 
for convenient reasoning about programs [Sym94,GV94]. Our approach stands in 
an alternative tradition of high-level, axiomatic program logics [GMW79,Pau87], 
and allows programmers to reason relatively directly at a level they are familiar 
with. In these respects, our work has roots in the logic of the LAMBDA 3.2 
theorem prover and the ideas of Fourman and Phoa [PF92] . 

In contrast to some approaches, where the programming language is embed- 
ded in a first order logic [Tho95,HT95,Sta03], we have chosen to use higher order 
logic (HOL) as a meta language in order to have a rich set-theoretic language 
for writing program specifications. For example, we will discuss a program for 
sorting lists. The specification involves mathematical definitions of being an or- 
dered list and being a permutation of another list, which are expressed in HOL 
using inductively defined relations. 

A key feature of our approach is that the meaning of the logic can be ex- 
plained in terms of purely operational concepts, such as syntactic definability 
and observational equivalence. Thus the logic will be intelligible to SML pro- 
grammers. On the other hand, the soundness of our logic with respect to this 
interpretation can be justified by a denotational semantics; indeed, in designing 
our logic we have relied on well-understood denotational models for guidance. 

It is clear that non-trivial proofs about programs require powerful proof au- 
tomation facilities combined with flexible user interaction. The Isabelle/HOL 
system [NPW02] provides a ready made proof environment with these features. 
Using a higher order abstract syntax (HO AS) presentation in Isabelle/HOL, 

* Research supported by EPSRC grant GR/N64571: “A proof system for correct pro- 
gram development”. 
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we have done pragmatic experiments without developing syntactical and logi- 
cal foundations from scratch. However, we have not found it possible to give a 
completely faithful encoding of our logic in Isabelle/HOL (see sections 2.1 and 
3.3), so our work should be regarded as an experimental prototype rather than 
a finished tool for reasoning about programs. We have in mind an approach that 
would fix these problems (section 3.3), but building a system that implements 
this approach is left as a possibility for future work. 

The Isabelle/HOL/Isar source files of our work are available from URL 
http://homepages.inf.ed.ac.uk/rap/mlProgLog.tgz. 

Related Work. In addition to related work mentioned above, our work is very 
close in spirit to Extended ML [KST97,KS98] . That work takes specification and 
reasoning about official SML programs much further than we do, including SML 
program modules, and deep study of modularity for specifications. However, our 
approach differs from that of Extended ML in its use of insights from denota- 
tional semantics, which has enabled us to design a clean and soundly-based logic, 
without the explosion in complexity that beset the Extended ML project. 

A foundational development of domain theory, also in Isabelle/HOL, is de- 
scribed in [MNvOS99]. This work is not an operational program logic, but pro- 
vides a HOL type of continuous functions, and the tools to reason about them. It 
also goes further in uniform definition of datatypes than we have yet. However, 
the need to reason foundationally limits its pragmatic convenience. Furthermore, 
we believe our presentation, based on a logically fully abstract model (section 
3.2), can be soundly extended to prove more observational equivalences than the 
system of [MNvOS99]. 

An embedding of the Ocaml language into Nuprl Type theory is reported in 
[Kre04]. Since Nuprl is extensional, fixpoints can be directly represented using 
the V combinator. However, these fixpoints can only be typed in a total setting, 
so this approach cannot reason about non-terminating functions, but only about 
functions total on a specified domain. E.g. our proof (section 5.1) that removing 
all zeros from a stream of naturals returns a stream without any zeros cannot 
be developed in the Nuprl approach. 

Structure of the Paper. In section 2 we present the syntax of the core pro- 
gramming language, and axioms of our logic for this core. In section 3 we explain 
the operational interpretation we have in mind for the logic, and outline the deno- 
tational semantics that underpins and justifies it. In section 4 we add datatypes 
for natural numbers and polymorphic lists to our language, and describe some 
case studies in reasoning about programs on these datatypes. In fact, reasoning 
about programs on these well-founded datatypes is not so different from reaso- 
niong in logics of total functions, like HOL iteself. Thus in section 5 we consider 
the recursive datatype of streams, and set a problem for ourselves that cannot be 
treated in a coinductive system of total functions. Indeed, we need one more gen- 
eral axiom to reason about recursive datatypes. The example of streams points 
the way towards a uniform treatment of all positive recursive datatypes. 
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2 The Core Language 

2.1 Syntax of the Programming Langnage 

See figure 1. There is a typeclass, SML, to contain programming language types, 
and a subclass, SMLeq, for SML equality types. Types in other typeclasses re- 
tain their purely logical meaning in HOL. Variables in typeclass SML range over 
syntactic programs. 

We use a higher order syntax embedding of the programming language into 
Isabelle/HOL: an ML function type constructor, -> (which applies only to SML 
types, and will be axiomatized as strict), is distinguished from the logical HOL 
function type, => (which applies to all HOL types, and is non-strict, even on SML 
types). As usual, there are constants 

lam :: "(’a => ’b) => ’a->’b" (binder "fn " 30) 

APP :: "[’a->’b,’a] => ’b" (infixl 55) 

relating object and meta function types. In these declarations, ’a and ’b are 
inferred to be in typeclass SML, and $ is infix application for SML functions. 
Isabelle binder syntax allows to write fn x. F x for lam F, where F has HOL 
functional type. We have polymorphic constants Fix (a fixpoint operator) and 
bot (a non-terminating program), which are definable in official SML. 

UNIT, BOOL, ** (product) and ++ (sum) types are given atomically, with their 
constructors (e.g. tt and ff) and destructors (e.g. BCASE). From the declared 
type of BCASE you can see that it is non-strict in its branches, which is correct 
for an SML case statement. 

Using Isabelle syntax translations, we can improve our syntax somewhat (see 
bottom of figure 1), but Isabelle parsing is so complex that we prefer not to steer 
too close to the wind with overloading and syntax translations. 

Isabelle/HOL typechecking over typeclass SML serves to typecheck programs. 
This is very convenient for both developing and using our tool, but not quite 
faithful to the SML definition, as HOL polymorphism is not the same as SML 
polymorphism. For example, ML let polymorphism is not captured in our encod- 
ing. Different representations are possible, with explicit typing judgements, that 
would overcome this problem, but these are significantly more complicated. 

2.2 Logic for the Core Programming Language 

HOL equality over types in the SML typeclass represents observational equiva- 
lence in the SML semantics, i.e. indistinguishability in any context. Equal pro- 
grams may be intensionally different. For example, a naive Fibonacci program is 
equal to an efficient one, although they have different complexities. This is part 
of our approach: prove contextual properties of a simple, but inefficient program, 
prove that an efficient program is equal to the simple one, and conclude that the 
efficient program has the same contextual properties. 

In figure 2 we define a judgement of definedness (i.e. termination), dfd, and 
syntax udfd for its negation. The defined constant milter will be explained 
below. 
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classes 

SML < type — {* a class of programming language types *} 

SMLeq < SML — {* a subclass for equality types *} 

defaultsort SML 



typedeci 


UNIT 








typedeci 


BOOL 








typedeci 


(’a,’b) "->" 


(infixr 


80) 


— functions *} 


typedeci 


(’a,’b) "++" 


(infixr 


85) 


— •[* sums *} 


typedeci 


(’a,’b) "**" 


(infixr 


90) 


— {* products *}■ 



arities 

UNIT : : SMLeq 
BOOL : : SMLeq 

:: (SML, SML) SML 
"++" :: (SML, SML) SML 
"++" :: (SMLeq, SMLeq) SMLeq 
"**" :: (SML, SML) SML 
"**" :: (SMLeq, SMLeq) SMLeq 



consts — {* bottom, unit and bool *} 

bot :: "’a::SML" — {* polymorphic bottom *}■ 

UN :: UNIT ("<>") 

tt : : BOOL 

ff : : BOOL 

EQ :: "(’a::SMLeq) -> ’a -> BOOL" 

BCASE :: "[’a, ’a] => (BOOL ->’a)" — {* non-strict *}■ 

consts — {* product : one constructor *}■ 

PAIR : : "’a -> ’b -> ’a ** ’b" 

PCASE :: "(’a -> ’b -> ’c) => (’a ** ’b -> ’c)" — {* non-strict *}■ 

consts — {* sum: two constructors *} 
ini : : "’a -> ’a ++ ’b" 

inr : : "’b -> ’a ++ 'b" 

SumCASE :: "[’a->’c, ’b->’c] => (’a ++ ’b -> ’c)" — {* non-strict *}■ 

consts — {* functions and recursion *} 

lam :: "(’a => ’b) => ’a->’b" (binder "fn " 30) 

APP :: "[’a->’b,’a] => ’b" (infixl "$" 55) 

Fix :: "((’a->’b) -> ’a->’b) -> >a->’b" 



syntax — {* some syntactic sugar *} 

IF :: "[BOOL, ’a, ’a] => ’a" 

"[,]" :: "’a => ’b => ’a ** ’b" (infixr 30) 

"[=]" :: " [(’a: :SMLeq) , ’a] => BOOL" (infixl 34) 

translations 

"IF b x y" == "(BCASE x y) $ b" — {* x and y are non-strict *}■ 

"x[,]y" == "PAIR $ x $ y" — {* pairing strict in both args *} 

"x[=]y" == "EQ $ X $ y" — {* EQ strict in both args *}■ 



Fig. 1. Language 




Reasoning About CBV Fnnctional Programs in Isabelle/HOL 205 

constdefs — {* definedness defined in terms of bot *} 
dfd : : " ’a => bool" 

"dfd X == X ~= bot" 
translations 

"udfd x" == "~ dfd x" 

— we will use HOL naturals to talk about least fixed point *} 
consts — {* usual iteration on HOL naturals *} 

iter :: "nat => (’a::type) => (’a => ’a) => ’a" 
constdefs — {* special iterator for use in Fix_min axiom *} 
milter :: "nat => (’a->’b) => ((’a->’b) -> ’a->’b) => ’a ->’b" 

"milter n b F == iter n b ("/h. fn x. F $ h $ x) " 

Fig. 2. Logical preliminaries 

Axioms for the core are given in figure 3. (One more general axiom will 
be introduced in section 5.1.) Application is strict (see axiom beta_rule); any 
expression lam F is defined. There is an extensionality rule, fn_ext, for SML 
functions. Eta follows from extensionality. 

Unlike [Sta03], we do not use a notion of value in formulating our axioms, 
but the notion, dfd, of definedness, since observational equivalence (equality in 
the logic) preserves definedness, but not “valueness” . 

UNIT, BOOL, ** and ++ types are treated as if inductively defined. Their con- 
structors and destructors are dfd, and their computation rules are axiomatized 
(e.g. if_true and if_false). As mentioned, the CASE constants are non-strict 
in their branches: when applied to a value, only the chosen branch is evaluated. 
Each of the type (constructors) UNIT, BOOL, ** and ++ also have an induction 
principle. 

Least fixpoint axiom. We want an axiom to say that Fix is the least fixpoint 
operator. First, assuming F is defined, from axiom Fix_rule we have 

Fix $ F = fn X. F $ (Fix $ F) $ x. 

Informally, Fix $ F should be the “limit” of approximations 

/iQ = T 

hn+i = fn X. F $ $ X 

Rewriting this using iter, the iteration constant over HOL naturals (figure 2), 
we have 

hn = iter n T (7.h. fn x. F $ h $ x). 

Abstracting this equation by n, F, and T^, we get the definition of milter in 
figure 2. 

To state that Fix $ F is the least fixpoint of F we say that if Fix $ F is de- 
fined in any context C : : ( ’ a-> ’ b) => ’ c, then some finite unfolding, hn, is already 
defined in that context: 

For technical reasons it is convenient to parameterise milter by the base case. 



1 
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axioms 

— {* application *} 

bot_ap [simp] : "udfd f ==> udfd (f $ x) " 

ap_bot [simp] : "udfd x ==> udfd (f $ x) " — {* strict *}■ 

— {* function types *}■ 

beta_rule [simp] : "dfd x ==> (lam F) $ x = F x" — {* call-by-value *} 
fn_ext : "[| dfd f; dfd g; ! !x. dfd x ==> (f$x) = (g$x) |] ==> f = g" 

fn_dfd [simp] : "dfd (lam F)" 

— {* least fixpoints *} 

Fix_rule: "Fix = (fn F x. F $ (Fix $ F) $ x) " 

Fix_min: "[| dfd F; dfd (C (Fix $ F)) |] ==> 

EX k. dfd (C (milter k bot F) ) " 

— {* UNIT type *} 

unit_Induct : "[| P <>; dfd x |] ==> P x" 
unit_dfd [simp] : "dfd <>" 

— {* BOOL type *} 

dfd_BCASE[simp] : "dfd (BCASE f g) " 

boolinduct: "[| P tt ; P ff ; dfd x |] ==> P x" 

if_true [simp] : "IF tt x y = x" 

if_false [simp] : "IF ff x y = y" 

eq_dfd [simp]: "[| dfd x; dfd y |] ==> dfd (x [=] y) " 

eq_ref lection: " ( (x [=] y) = tt) = (dfd x & x = y) " 

— {* product types *}■ 
dfd_PCASE[simp] : "dfd (PCASEf)" 

pair_induct: "[|!!x y. [I dfd x; dfd y |]==> P(x[,]y); dfd z |]==> P z" 
pair_dfd [simp] : "[| dfd x; dfd y |] ==> dfd (x[,]y)" 

split [simp] : "PCASE c $ (x[,]y) = c $ x $ y" 

— {* sum types *} 

dfd_SumCASE[simp] : "dfd (SumCASE f g) " 
df d_inl [simp] : "dfd ini" 

df d_inr [simp] : "dfd inr" 

SumCASE_inl [simp] : "SumCASE f g $ (ini $ x) = f $ x" 

SumCASE_inr [simp] : "SumCASE f g $ (inr $ y) = g $ y" 

Sum_induct: "[| !!x. dfd x ==> P (ini $ x) ; 

! !y. dfd y ==> P (inr $ y) ; dfd z |] ==> P z" 



Fig. 3. Core language axioms 




Reasoning About CBV Fnnctional Programs in Isabelle/HOL 207 



dfd (C (Fix $ F)) =^3n. dfd (C hn) ■ 

Using milter for in this equation, we get the axiom Fix_min of figure 3. 
The notion of a function being total in one argument is defined: 

totl :: "(’a -> ’b) => bool" 

"totl f == ALL X. dfd X — > dfd (f $ x) " 

This is used in examples below. 

2.3 Observational Order 

We define observational order, observational equivalence (syntax x <o= y and 
X = 0 = y resp.) and observational limit. 

obsLeq :: "’a => ’a => bool" (infixl "<o=" 18) 

"a < 0 = b == ALL (C::>a => UNIT), dfd (C a) — > dfd (C b) " 
obsEq :: "’a => ’a => bool" (infixl "=o=" 18) 

"a = 0 = b == (a < 0 = b) & (b <o= a)" 
obsLim :: "’a => (nat => ’a) => bool" 

"obsLim y x == 

ALL (C::’a => UNIT), dfd (C y) = (EX (n: :nat) . dfd (C (x n) ) ) " 

< 0 = is in fact a partial order, preserved by every context, bot is the <o=-least 

element of every SML type. It is worth noting that the following are equivalent 

— a = 0 = b 

- ALL (C: : ’a => UNIT) . C a = C b 

- ALL (C::’a => UNIT), dfd (C a) = dfd (C b) 

— EX X. obsLim a x & obsLim b x 

The lemma we need for later proofs is: 

Fix_lim_iter : "dfd F ==> obsLim (Fix $ F) (7on. milter n bot F)" 
saying that Fix $ F is the observational limit of finite iterations of F. 

3 Interpretations of Onr Logic 

We outline two kinds of semantic interpretations for our logic: one in terms of 
purely operational concepts, and one in terms of a denotational model for the 
programming language. The former is what we expect the programmer to have 
in mind, while the latter is used to justify the soundness of our logic, and also 
to inspire the design of the logic in the first place. The agreement between these 
two interpretations is a property known as logical full abstraction [LP97]. 

The language presented in section 2 is intended to provide an extensible core 
for more realistic programming languages, so we formulate our interpretations 
in a general setting. To begin with, let us merely assume that we have 
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— A programming language L consisting of types of typeclass SML, and of terms 
of such types, extending the language defined by figure 1. 

— An intended operational semantics for L. It suffices to give a relation M IJ- v 
between closed monomorphic terms M oi C and certain “observable values” 
V, whose precise nature we need not specify^. 

— A logical language K{C), whose formulae are constructed from terms of C 
by means of the usual logical operators =, /\, \/, ALL, EX. 

In the logic presented in Section 2, there are many formulae not in K{C), since 
for instance we may mix types of C with HOL types such as nat. However, in 
order to give the idea behind the operational interpretation, it is simplest to 
concentrate on K (£) . 



3.1 Operational Interpretation 

We now give a simple way of reading formulae of K{C) in terms of operational 
concepts, by defining what it means for a formula to be operationally true. For 
closed monomorphic formulae P (i.e. those containing no free term or type vari- 
ables), operational truth is defined by structural induction: 

— A formula M=N is operationally true if M and N are observationally equiv- 
alent: i.e., for all contexts C'(— ) of C and all observable values v we have 

C{M)!^v iff C'(A^)l|u (1) 

The programming intuition is that M may be replaced by N in any larger 
program without affecting the result^. 

— A formula P/\Q is operationally true if both P and Q are operationally 
true; similarly for \/ and ~. Thus, the propositional connectives have their 
familiar classical reading. 

— A formula ALL (x: :t) .P is operationally true if, for all closed terms M : : t of 
C, the formula P[M/x] is operationally true. Similarly for EX. The important 
point is that variables range over syntactically definable programs, rather 
than elements of some independent mathematical structure. 

If two terms are observationally equivalent, they will satisfy exactly the same 
predicates; i.e. substitutivity of equality is sound for this interpretation. Thus, 
the above is the usual classical interpretation of first order logic (with a separate 
ground sort for each type), where a type t is interpreted as the set of closed 
monomorphic terms of type t modulo observational equivalence. 

^ Typically the observable values would be printable values of ground types such as 
integers and booleans, plus a dummy value used to indicate termination for programs 
of higher type. 

® For how this relates to the formal definition of observational equivalence given in 
section 2, see section 5.1 below. 
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We now extend our interpretation to open and polymorphic formulae: 

~ An open monomorphic formula P (with free variables xi,. . . ,x„) is opera- 
tionally true if all of its closed instances P[Mi/xi, . . . , M^/Xn] are opera- 
tionally true, where the Mi are closed terms of appropriate types. 

— A polymorphic formula P (with type variables oi, . . . ,am) is operationally 
true if all its monomorphic instances P[ti/ai , . . . , are operationally 

true, where the ti are monomorphic types of P,. 

Is it convincing, on purely operational grounds, that the axioms of figure 3 
are operationally true? In principle this might depend on the language £, but 
in fact most of our axioms have been formulated to be true for a wide range of 
languages, even including non- functional fragments of SML. 

Glossing over details, the only axioms that raise interesting questions are 
fn_ext and Fix_min. The axiom fn_ext (function extensionality) is the only 
one of our axioms that is specific to functional languages, and corresponds to 
what is known as the context lemma', if two programs are applicatively equivalent 
then they are observationally equivalent. The idea behind axiom Fix_min is that 
any “experiment” C which yields a value when performed on a term Fix $ F 
can only unroll the recursion operator a finite number of times, so that the same 
experiment must succeed when performed on milter k bot F for some k. 

Fix_min plays more or less the same role as the familiar Scott induction 
principle in program logics such as LCF [Sco93]. We prefer the Fix_min axiom 
partly because it avoids the reference to inclusive predicates, and partly because 
it is not dependent on an order relation C in the style of domain theory. If we 
introduced such a relation as primitive, we would be obliged to axiomatize it, 
which is problematic since the appropriate order relation may vary from one 
language C to another"^. 

3.2 Denotational Interpretation 

Whilst our axioms can (with hindsight) be justified on purely operational 
grounds, it is better to achieve this by showing that they hold in some de- 
notational model which agrees with our operational one in a suitable sense. The 
use of a denotational semantics has several advantages. Firstly, our understand- 
ing of the model can be used to suggest what the axioms ought to be in the 
first place. Secondly, the verifications that the axioms hold in the model tend 
to involve more abstract reasoning than the corresponding operational verifica- 
tions, and to be more easily transferable from one language to another. Thirdly, 
a denotational semantics can be used to show soundness (and hence consistency) 
for the whole logic, not just the fragment K{C), whereas it is unclear how the 
operational interpretation could be extended to cover types such as UNIT->nat. 

Without going into technical details, the model we have in mind is a presheaf 
category [C°p, Set], where C is some denotational model of £. Types of our logic 

^ There are even “functional” languages for which the order relation is not defined 
extensionally, see e.g. [Lon99]. 
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will be interpreted by objects X in the presheaf category, and closed terms by 
morphisms I X. We then interpret our logic in the ordinary classical way 
over the homsets Hom(l,X). 

The category [C°p, Set] has two important full subcategories, corresponding 
to Set and to C itself. We use objects of Set to interpret pure HOL types such as 
nat or nat->nat, and objects of C to interpret SML types. Thus, [C°p, Set] offers 
a model in which the ordinary mathematical universe of sets lives side-by-side 
with the computational universe of SML types and programs. 

Moreover, we can choose the category C to be a model of £ that is both 
fully abstract (observationally equivalent programs have the same denotation) 
and universal (every element of the relevant object is the denotation of some 
program). From these facts it is not hard to see that, when restricted to K{£), 
our interpretation agrees precisely with the operational one given earlier. This 
goodness-of-fit property is known as logical full abstraction. 

In future work we will extend our logic to deal with some non-functional 
fragments of SML including exceptions, references and I/O. An overview of the 
denotational ideas underpinning our approach is given in [Lon03]. 

3.3 Some Serious Problems 

An attempt to give a denotational semantics in this way for the whole of our 
logic, as currently formalized, shows up two significant problems. These arise 
from certain features of Isabelle/HOL: firstly, the definite description operator 
(written THE) is available for all types, and secondly, the mathematical set bool 
of booleans also does duty as the type of propositions. Consider the following 
“programs” : 

UNIT_swap : : "UNIT -> UNIT" 

"UNIT_swap == fn x. (THE y. ~y=x) " 

UNIT_swap> :: "UNIT -> UNIT" 

"UNIT_swap’ == fn x. If (x=bot) <> bot" 

Each of these terms claims to be a function that swaps bot and <>. However, 
this function cannot be definable in SML (with it, one could solve the halting 
problem), violating our operational requirement that terms of an SML type are 
SML definable. In fact, one can derive a contradiction using the term UNIT_swap 
and the axiom Fix_rule, since UNIT_swap clearly does not have a fixed point. 

The definite description operator is pragmatically essential in pure HOL, and 
useful in specifications of programs, but its use must be controlled, as the above 
example shows. Also, the two roles of bool must be separated out. Our proposed 
solution to these problems (which can be justified by our denotational semantics) 
is as follows. First, we introduce a typeclass mathtype for “pure mathematical 
types” , a subclass of type analogous to SML and corresponding denotationally to 
Set. We then insist that free variables in the body of a definite description are 
restricted to be of class mathtype. We also distinguish appropriately between the 
mathematical type bool of booleans and a type prop of propositions (not itself 
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typedecl NAT 
arities NAT : : SMLeq 

consts — {* datatype of natural numbers *}■ 

ZZ : : NAT 

SS : : "NAT -> NAT" 

NCASE :: "’a => (NAT -> ’a) => (NAT -> ’a)" — {* case is non-strict *} 
axioms — {* nat as a datatype *} 

ZZ_dfd [simp] : "dfd ZZ" 

SS_totl [simp] : "totl SS" 

dfd_NCASE [simp]: "dfd (NCASE f g) " 
nat _ Induct : 



"[| P ZZ; My. [| dfd y; P y I] ==> P (SS $ y) ; dfd x |] ==> P x" 
NCASE_ZZ [simp] : "NCASE ZZ x y = x" 

NCASE_SS [simp]: "NCASE (SS $ n) x y = (y $ n) " 



Fig. 4. A datatype of natural numbers 



a mathtype). Finally, we also restrict the HOL function extensionality rule to 
functions whose domain is a mathtype (though we do not know whether our logic 
is consistent without this last restriction.) Unfortunately, this proposal cannot 
be implemented without significant re-engineering of Isabelle/HOL, or building 
our own system from scratch. However, we are confident that all the proofs we 
have done would go through in such a system. 

4 Inductive Datatypes 

In section 5 we indicate, using the example of streams, how all positive recursive 
datatypes can be uniformly constructed from the types of their constructors. In 
this section we simply axiomatize inductively defined (well founded) datatypes 
NAT and LIST as examples, to show we can reason about them straightforwardly. 

Modulo a good deal of detailed work, reasoning about total programs over 
well founded datatypes is not so different than reasoning about systems of total 
functions, such as type theory or HOL itself. For example, uniform iteration and 
recursion functions are defined (primitive recursion over NAT, fold over LIST, 
. . . ), and their properties proved. These can be used to define other programs 
whose totality (on defined inputs) follows easily. 

4.1 Natural Numbers 

The formalization of NAT is shown in figure 4. There are constants for the con- 
structors ZZ and SS, and for the eliminator NCASE. These are axiomatized to be 
dfd and totl. There are axioms for the computation of NCASE. Only the induc- 
tion axiom, nat_Induct, while natural, needs serious semantic justification. 

By primitive recursion over HOL natural numbers (nat) there is an injection 
from nat onto the defined NATs. By this means we can convert many properties 
of NAT into properties of nat, which may be automatically proved by Isabelle’s 
tactics. 
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By HOL inductive definition we define order relations on NAT, e.g. the less- 
than relation (syntax x[<]y). 

inductive NATLT intros 

NATLT_Z: "dfd x ==> ZZ [<] SS $ x" 

NATLT_S: "m [<] n ==> SS $ m [<] SS $ n" 

For specification, this relation is more convenient than the BOOLean valued pro- 
gram that computes less-than. Complete induction can be derived, and from 
this a least number principle and well founded induction for NAT-valued mea- 
sures. As an example, we have defined a naive Fibonacci program, and a fast 
Fibonacci program, and proved they are equal. The naive Fibonacci program is 
easily seen to satisfy the Fibonacci recursion equations, hence so does the fast 
Fibonacci program. Moreover, every program satisfying the Fibonacci recursion 
equations is equal to the the naive Fibonacci program. 

4.2 Lists 

Polymorphic LIST is axiomatised analagously with NAT. We use infix [: :] for 
CONS; the list eliminator is LCASE. Basic functions like map, append, flatten 
and reverse are easily defined from the uniform fold operator. Their correctness 
follows from showing they have the expected recursion equations, usually by a 
few steps of computation. Many basic properties follow by easy induction: map 
distributes over composition, append is associative, reverse is involutive, .... We 
give an efficient reverse program, and show it is equal to naive reverse. 

After defining a length function, we derive a length induction principle for 
lists from the wellfounded measure induction over NAT (section 4.1). With this we 
prove a more challenging example: theorem 16 from Paulson’s textbook [Pau91] 

aop $ y $ (foldleft $ aop $ e $ xs) = foldleft $ aop $ y $ xs 

where aop is associative and e, a right identity of aop, is dfd. 

Sorting. The examples mentioned above are trivial in one sense: correctness is 
expressed in terms of some recursion equations. Our stated reason for axioma- 
tising ML in HOL, instead of FOL, is to have a richer language for program 
specifications. The specification for sorting involves abstract properties ordered 
and permutation. For example, permutation (syntax xs ~ ys) is given as a HOL 
inductive definition: 

consts perm :: "(’a LIST * ’a LIST) set" 
inductive perm intros 

perm_trn: " [ I xs ~ ys ; ys ~ zs I] ==> xs ~ zs" 

perm_NIL; "NIL ~ NIL" 

perm_C0NS: "[| dfd x; xs ~ ys I] ==> x[::]xs ~ x[::]ys" 

perm_hd; " [ I df d x; df d y ; df d zs I ] ==> x[::]y[::]zs ~ y[:;]x[:;]zs" 

We give a program for insertion sort (figure 5), and prove it is correct. The 
sort program itself is a polymorphic function taking in a BOOLean valued order 
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insrt :: "(’a -> ’a -> BOOL) -> ’a -> ’a LIST -> ’a LIST" 

"insrt == fn le x. Fix $ (fn f. LCASE (uni x) 

(fn y ys. IF (le $ x $ y) 

(x[::](y[::]ys)) 
(y[::](f $ ys))))" 

isort :: "(’a -> ’a -> BOOL) -> ’a LIST -> ’a LIST" 

"isort == fn le . Fix $ 

(fn f xs. LCASE NIL (fn y ys . insrt $ le $ y $ (f $ ys)) $ xs)" 
Fig. 5. Insertion sort program 

pre_rmZZs :: "(NAT SEQ -> NAT SEQ) -> NAT SEQ -> NAT SEQ" 
"pre_rmZZs == fn F. SCASE (fn p. 

IF (Fst$p [=] ZZ) 

(F $ (Snd$p $ <>)) 

(Fst$p[: : :] (fn z. F $ (Snd$p $ <>))))" 
rmZZs : : "NAT SEQ -> NAT SEQ" 

"rmZZs == Fix $ pre_rmZZs" 

Fig. 6. A fnnction to remove all zeros from a NAT SEQ 



function, le, and a list, and returning a sorted list. We use an Isabelle locale 
to specify the properties the order function must have, and prove in that locale 
that isort returns an ordered permutation of the input list. Thus for any in- 
stantiation of that locale (e.g. with the order function LE over NAT) isort is a 
correct sort program. 

5 Recursive Datatypes: Streams 

The examples in preceding sections, over inductive datatypes, could be carried 
out in logics of total functions. In this section we address an example that cannot 
be treated in logics of totality. Consider the type of polymorphic streams (SEQ 
for sequence), that would be defined in SML by: 

datatype ’a SEQ = SCONS of ’a * (unit -> ’a SEQ) 

We represent this datatype using the well known characterization that 
(’a SEQ, SCONS) is the initial algebra of the functor ST X = ’a ** (UNIT->X). 
As an example over SEQ, consider the function, rmZZs, that recurses through a 
NAT SEQ removing all the zeros (figure 6) . A datatype analogous to SEQ is defin- 
able using coinduction in HOL, Coq, and Nuprl, but the function rmZZs could 
only be definable in a complex way, with a restricted domain. 

Streams are formalised (figure 7) with two constants and three axioms. The 
constructor, SCONS (infix [:::]) is dfd and totl. The other constant, Psi, 
canonically completes the initial algebra property. This is expressed by axiom 
seq_init, which states that if g : : ST( ’b) -> ’b is dfd, then Psi $ g is the unique 
dfd function f making the diagram commute: 
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typedecl ’a SEQ 

arities SEQ : : (SML)SML — {* no SMLeq arity *}■ 

— {* covariant functor characterises ’a Seq" *}■ 

types (’a, ’b) ST = "’a ** (UNIT -> ’b)" — object part of functor *} 

constdefs — {* arrow part of functor *}■ 

ST :: "(’c->’d) -> (’a,’c)ST -> (’a,’d)ST" 

"ST == fn f. PCASE (fn (a:: ’a) (h: :UNIT-> ’ c) . (a [,] (f oo h)))" 
consts — {* sequences (lazy lists) *} 

SCONS :: "(’a, ’a SEQ) ST -> ’a SEQ" — •[* constructor *}■ 

Psi :: "((’a,’b)ST -> ’b) -> ’a SEQ -> ’b" 
constdefs — {* the initial algebra property *}■ 

SEQ_Init_sq :: "((’a,’b)ST -> ’b) => (’a SEQ -> ’b) => bool" 
"SEQ_Init_sq g f == (dfd f) & ((f oo SCONS) = (g oo (ST $ f)))" 
axioms — {* stream as a datatype *}■ 
df d_SC0NS [simp] : "dfd SCONS" 

totl_SCONS[simp] : "totl SCONS" 

seq_init: "dfd g ==> SEQ_Init_sq g f = (f = Psi $ g) " 

Syntax note: oo is program composition, i.e. f oo g = fn x. f $ (g $ x). 

Fig. 7. Streams as an initial algebra 



>a ** (UNIT->’a SEQ) 
PCASE (fn h t. h [,] f oo t) 
>a ** (UNIT->’b) 



= ST(>a SEQ) 
= ST(f) 

= ST(>b) - 



SCONS 



g 



’a SEQ 

f = Psi(g) 
>b 



From this we define the categorical destructor 

SDESTR :: ">a SEQ -> (>a ** (UNIT->’a SEQ))" 
"SDESTR == Psi $ (ST $ SCONS)" 



and prove that SCONS and SDESTR are inverse isomorphisms. The SML case 
eliminator for streams is defined, and its computation rule proved: 

SCASE :: "(’b -> (UNIT -> >b SEQ) -> ’a) => (>b SEQ -> >a)" 
"SCASE f == PCASE f 00 SDESTR" 
lemma SCASE_SCONS: "SCASE y $ (a[:::]as) = y $ a $ as" 



Examples. Now we can define many standard functions on streams, and prove 
their usual properties: head and tail (shd, stl), nth element from a stream 
(snth), take or drop n elements from the front of a stream (sTAKE, sdrop). For 
example: 
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sdrop $ n $ (stl $ xs $ <>) = stl $ (sdrop $ n $ xs) $ <> 

snth $ n $ s = shd $ (sdrop $ n $ s) 

Interesting from a semantic viewpoint, we show that every stream is the obser- 
vational limit of its initial segments 

lemma s_lim_sTAKEs : "obsLim s (7on. sTAKE n $ s)" 

However, we do not yet seem able to prove stream extensionality 

(ALL n. snth $ n $ s = snth $ n $ t) ==> s = t 

which is operationally true. Stream extensionality is equivalent to a characteri- 
zation proposed in [Pit94]. Finally, we cannot prove that (ST, SDESTR) is a final 
coalgebra. Thus, another axiom seems needed. 



5.1 Another General Axiom 

Our final general axiom reflects that x =o= y means x and y are indistinguishable 
in any context. 

axioms obs_eq: "x =o= y ==> x = y" 

This can be seen as saying that Leibniz equality (e.g. observational equivalence) 
implies extensional equality. By a fact from section 2.3, this is equivalent to 
uniqueness of observational limits 



obsLim a x ==> obsLim b x ==> a = b 

From this second formulation it is clear that stream extensionality follows from 
s_lim_sTAKEs. Furthermore, from stream extensionality we conjecture we can 
prove that that (ST, SDESTR) is a final coalgebra. 

Using stream extensionality, we have proved that the program rmZZs returns 
a sequence with no zeros! 



6 Conclusion 

Reasoning about programs is hard. Our high level, operationally inspired logic 
doesn’t remove the need to reason about the details of a program. However 
Isabelle’s automation proved very useful for routine details, such as the frequent 
need for case distinction between df d and udf d arguments in our CBV language. 
There is plenty of scope for special purpose tactics to address other routine tasks. 
We found the use of HOL, with its inductive definition of properties, to be much 
better than first order (i.e. equational) specification, and were also able to convert 
some questions about SML datatypes into questions about HOL types that are 
easily solved in Isabelle/HOL. 
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Abstract. We provide a proof that the elegant trick of Olivier Danvy 
for expressing printf-like functions without dependent types is correct, 
where formats are encoded by functional expressions in continuation- 
passing style. Our proof is formalized in the Calculus of Inductive Con- 
structions. We stress a methodological point: when one proves equalities 
between functions, a common temptation is to introduce a comprehen- 
sion axiom and then to prove that the considered functions are exten- 
sionally equal. Rather than weakening the result (and adding an axiom), 
we prefer to strenghten the inductive argumentation in order to stick to 
the intensional equality. 



1 Introduction 

In [1], Olivier Danvy proposes an elegant trick for expressing printf-like functions 
and procedures in the ML type system. His idea is to replace the concrete version 
of the first argument, on which the number and the type of remaining arguments 
depend, with a higher-order function. In order to avoid questions related to 
side-effects, let us consider the sprintf function, which builds a string from its 
arguments. The first argument of sprintf is a format, which specifies the number 
and the type of the remaining arguments. In practice, notably in the C language, 
the format is often a string, where occurrences of "/.d (respectively, of "/.s, etc.) 
specify that an integer (respectively, a string, etc.) should be inserted there. For 
instance, in ML syntax, 

sprintf "The "/,s is "/,d "/,s . " "distance" 10 "meters" (1) 

would return the string "The distance is 10 meters.". 

It is more convenient, at least for reasoning purposes, to represent formats 
using a concrete type such as lists of an appropriate type of directives. For 
example, the first argument of (1) could be represented by 

[Lzt("The "); String; Lit{" is "); Int; String; Lit {" . ")]. (2) 

In a language where dependent types are allowed, it is then a simple exercise to 
program the desired behavior. In the case of ML, Danvy proposes to represent 
the format by a functional expression: 

lit "The" o str o lit " is " o sint o str o lit".", (3) 

K. Slind et al. (Eds.): TPHOLs 2004, LNCS 3223, pp. 217-224, 2004. 

(c) Springer- Verlag Berlin Heidelberg 2004 
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where o is the sequential composition of functions, and the functions such as 
int and string take a continuation on strings, a string, an argument of the 
appropriate type and return a continuation on strings. More specifically, str is 
defined by \k a s . k{a" s)) , where ~ is string catenation and sint is defined by 
Xk an. k{a " string- of-int n)). The definition of lit is Xs k a. k^a's)). Reducing 
these definitions in (3) yields 

Xkasin S 2 - k{a ~ "The " " si " " is " ~ string-of-int n ~ S 2 ~ " • ") (4) 

and we see that applying the following continuation-based version of sprintf to 
a functional format does the job. 

sprintfk := Xf. f (As. s) " " (5) 

An interesting feature of functional formats is that they are more general 
than concrete formats given by either a string as in (1), or a list as in (2): 
concrete formats are bound to a fixed number of data types, whereas functional 
formats are extensible - they can handle any data type X, provided we are given 
a function from X to string. 

If we look at types, we remark that str has the type {string ^ ^ string 

string 13, sint has the type {string ^ a) ^ string int a, hence stro sint 
has the type {string ^ a) ^ string string int a. In general, the type of 
a functional format has the form {string a) ^ string Xi ^ ^ X„ —>■ a. 

If two formats /i and /2 are respectively of type 

{string ^ P) ^ string ^ Xi ^ ^ ^ /? (6) 

and 

{string ^ a) ^ string Xn+i ^ ^ X^+p a, (7) 

their composition /i o /2 is of type 

{string ^ a) ^ string ^ Xi ^ ^ ^ ^ ^ Xn+p a (8) 

while the type inference mechanism yields 

P = Xn+l ^ ^ Xn+p a. (9) 

We provide here a formal proof that Danvy’s functional formats are cor- 
rect representations of usual concrete formats. More precisely, for any concrete 
format p, we inductively define its functional representation kformatp and we 
prove that sprintf applied to p yields the same function as sprintfk applied to 
kformatp - these functions are even convertible. 

As the result relates dependent types with polymorphic types, we need a 
logic where these two features are present. Our formalization is carried out in 
the Calculus of Inductive Constructions [2]. Two proof techniques are illustrated: 
making the statement of an inductive property on functions more intensional, 
rather than reasoning on extensional equality; and using type transformers to 
recover what is performed by type inference in (9). A complete Coq script (V8.0) 
is available on the web page of the author. 
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2 Type Theory and Notation 

The fragment of the Calculus of Inductive Constructions to be used here includes 
a hierarchy of values and types. On the first level we have basic inductive and 
functional values such as 0 and Ax : nat. x. They inhabit types such as nat or 
nat nat, which are themselves values at the second level and have the type Set. 
In the sequel a, j3, 7 , 5 range over such types. Polymorphic types are obtained 
using explicit universal quantification, e.g. Va a^a. We can also construct type 
transformers such as Aa. nat a, of type Set — > Set. The type of Set and of 
Set Set is called Type. Types can depend on values of any level. 

Functions are defined using the following syntax: 

Definition function^name arg^ . . . arg^ : type-of-the-result := body. 

where type-of-the-result depends on arg^,i € 1 . . .n. In the case of a recursive 

definition, Definition is replaced with Fixpoint. 

We suppose that we are given a type string (in Set), endowed with a binary 
operation _ " _ (catenation) and the empty string denoted by " " . We don’t need 
an algebraic law for catenation. 

3 Concrete Formats 

Our definitions will be illustrated on a format specified by "foo "/,s bar "/,i" in 
C language notation. Assuming two strings foo and bar, the structured concrete 
representation that we will use is: 

Definition example := Lit foo {Str {Lit bar {Lnt Stop))) (10) 

where Str and Lnt are respectively of type string format format and 
int — > format format; format is a dedicated inductive type defined below. 

For the sake of generality (functional formats can handle arbitrary printable 
data) we first introduce a structure for printable data, composed of a carrier X 
and of a function r-X which appends a printed representation of a value of type 
X to the right of a given string. This is equivalent to providing a function for 
converting an inhabitant of A to a string, but turns out to be much more handy. 

Record Printable : Type := mkpr {X : Set; r-X : string ^ X ^ string}. 

In Coq, a record is just a tuple and fields are represented by projections. For our 
example, we suppose that we are given a type int for integers and a corresponding 
function r-int. Then we can define pint as mkpr int r-int, and we have X pint = 
int and r-X pint = rint. We use the notation '' a p x'^ for r-X Pax, where P is 
a Printable, a is a string and x is an A P. 

The type of concrete formats is given by: 

Type format := Stop \ Data of Printable x format \ Lit of string x format. 

In our example, Lnt is defined as Data pint. Note that from printable integers, 
it is easy to add printable lists of integers and so on. 

In the sequel, (j) ranges over format and P ranges over Printable. 




220 Jean-Frangois Monin 



4 A First Translation 

In this section, we work with a monomorphic version of Danvy’s functional 
formats. This it is not satisfactory, but the proof technique that we want to 
use is simple to explain. Polymorphic functional formats will be considered in 
section 5. 

The type associated to a format is: 

Fixpoint type-of-fmt (f> : Set := 
match (j) with 
I Stop string 

I Data P (f> ^ X P ^ type- of-fmt (f) 

I Lit s 4> ^ type- of-fmt 4> 

end. 

For example, type-of-fmt example reduces to string int string. 

4.1 Basic Version of sprintf with Dependent Types 

We start with a loop which prints on the right of an additional argument. 
Fixpoint rsprintf (j) : string type-of-fmt f := 
match 4> with 
I Stop Xa. a 

I Data P 4> ^ Xax. r-sprintf 4> {a p x) 

I Lit s 4> ^ Xa. r-sprintf (f {a" s) 

end. 

The desired function provides the empty string " " as the initial accumulator to 
the previous function. 

Definition sprintf <f> := r-sprintf (j) 

4.2 Monomorphic Functional Formats 

The following type of Danvy’s sprintfk is allowed in the Damas-Milner type 
system, it can then be used in languages of the ML and Haskell family. Though 
there is no restriction over a, the only form a can take is {type-of-fmt (f) for 
some format (j). However, the point is that (j) itself is no longer an argument of 
sprintfk. 

Definition sprintfk: Va {{string ^string) strings a) ^ a := A/. /(As. s) 

Functional formats are constructed using primitive formats such as lit, str, sint, 
etc. The two latter are themselves special cases of our kdata, which is not ad- 
mitted in ML, in contrast with str, sint, etc. However we keep kdata here for 
the sake of generality in the reasoning. In ML examples, we use only instances 
of kdata. 

Definition kid : {string ^string) string string := Xka. ka. 

Definition kdata P : Va {string ^ a) ^ string — > XP ^ a := 

Xa. Xk. Xax. k{apx). 

Definition lit {x:string) : Va {string ^ a) ^ strings a := Xa. Xka. k{a"x). 
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4.3 Translation 

Here is the general construction of functional formats from concrete formats. 

Fixpoint kformat (j) : {string string) string type-of-fmt 4> '■= 
match <j) with 
I Stop kid 

I Data P 4> ^ {kdata P {type-of-fmt 4>)) o {kformat 4>) 

I Lit X 4> ^ {lit X {typc-of-fmt (j>)) o {kformat 4>) 

end. 

For example, kformat example is convertible with 
{lit foo {string int string)) o 

{str {int string)) o {lit bar {int string)) o {sint string). 



4.4 Correctness of sprintfk w.r.t. sprintf 

A brutal attempt to prove that {sprintf (j)) = {sprintfk {kformat (j))) holds for all 
(j) fails, because the accumulator changes at each recursive call (an induction on 
(j) would lead us to to prove something on " " " s while the induction hypothesis 
is on ""). The usual trick is then to replace "" with a variable (let us call it a) 
which is in the scope of the induction. We first unfold sprintf and sprintfk in 
order to work with r_sprintf and kformat. Then, if we try to prove 

Va r _ sprintf 4> a = kformat 4> {Xs. s) a (11) 

by induction on (f>, we face another problem: how to prove 

Xx. r - sprintf 4> {a p x) = Xx. kformat (f> string {Xs. s) {a p x) 

from the induction hypothesis (11)? This is a typical case where extensionality 
makes life easier. Adding the following axiom would allow us to finish the proof 
in a trivial way. 

Axiom extensionality. 

Va (3, yfg : a ^ (3, (Vx: a, f x = g x) ^ {Xx. fx) = {Xx. g x). 

But this workaround is not satisfactory. In order to prove the desired (inten- 
sional) equality, without any additional axiom, we work with a still more inten- 
sional statement: 



Xa. r _ sprintf (f a = Xa. kformat (p {Xs. s) a (12) 

or even a ? 7 -reduced version of the latter: 

r _ sprintf p = kformat 4> {Xs. s). (13) 

The proof is very short. The key is to observe that Xax. k {a p x) = kdata P a k, 
and similarly for lit. We can then rewrite r ^sprintf as follows: 
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Fixpoint r^sprintfl (j) : string — > type-of-fmt (j) := 
match (j) with 
I Stop Xa. a 

I Data P 4> ^ kdata P {type-of-fmt 4>) {r_sprintfl (j>) 

I Lit s 4> ^ lit s {type-of-fmt 4>) {r^sprintfl 4>) 

end. 

The following lemma is easily proved by induction on (j): 

V(/) r_sprintfl ([) = kformat (p (As. s). (14) 

Unfolding definitions and converting r_sprintf to r-sprintfl provides the desired 
corollary. 

Theorem sprintf-sprintfk: Vgi sprintf p = sprintfk {kformat (j>). 

5 Typing Formats with Type Transformers 

The previous typing of kformat is unfair. If 0 is a given closed format, the 
expression kformat (p has a closed type as well. A limitation of this typing is 
that it prevents formats to be sequentially composed. For example, 

{kformat {Lit foo {Str Stop))) o {kformat {Lit bar {Lnt Stop))) (15) 

is ill-typed. In order to recover plain Danvy’s functional formats, which do not 
suffer from such limitations, we use type transformers. In some sense, the latter 
implement the type inference mechanism of the ML type system. In our example, 
the type transformer to be considered maps a type a to string int a. 

Definition idt := Xa. a. 

Definition datat P := Xa. {X P ^ a). 

Fixpoint type-transf-of-fmt (p : Set Set := 
match (p with 
I Stop idt 

I Data P (p ^ {datat P) o {type-transf-of-fmt p) 

I Lit s p ^ typeXransf-of-fmt p 

end. 

The new typing of r_sprintf is as follows. 

Fixpoint r_sprintf p : string type-transf-of-fmt p string:= 
match p with 
I Stop Xa. a 

I Data P p ^ Xa X. r_sprintf p {ap x) 

I Lit s p ^ Xa. r^sprintf p {a" s) 

end. 

Definition sprintf p := r_sprintf p 
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5.1 Polymorphic Functional Formats 

In this version, the type given to a functional format takes the form kt tf, where 
tf is a type transformer. 

Definition kt {tf: Set^ Set) := Va {strings a) strings tfa. 

Accordingly, the new typings of kid, kdata and lit are: 

Definition kid : ktidt := Aa. \ka. ka. 

Definition kdata P : kt{Xa. XP^ a) := Aa. Xk: strings a. Xax. k{apx). 
Definition lit x : kt idt := Aa. Xka. k{a"x). 

Observe that, in this version, no additional argument is needed in lit and kdata 
(or its instances such as sint). 

The counterpart of type unification shown in equations (6) to (9) of the 
introduction is performed in the following version of function composition. 

Definition u-seq{tg,tf: Set ^ Set) : kttg^ kttf^ kt(tgotf) := 

Xgf Aa. Xk. g {tfa) {fa k). 

We use the infix notation @ for u-seq. 

5.2 Translation 

Definition sprintfk {tf: Set ^ Set) : kttf^ tf string := Xf . f string {Xs. s) 

Fixpoint kformat (j) : kt {typeXransf-of-fmt (j>) := 
match 4> with 
I Stop kid 

I Data P (j> ^ {kdata P) @ {kformat 4>) 

I Lit X 4> ^ {lit x) @ {kformat (f>) 

end. 

As desired, formats can be composed. For example, kformat example is convert- 
ible with {kformat {Lit foo {Str Stop))) @ {kformat {Lit bar {Lnt Stop))). A 
format can even be composed with itself, as in 
let kex = kformat example in kex @ kex. 

5.3 Correctness of sprintfk w.r.t. sprintf 

The proof is along the same lines as before. In the induction steps, we have to 
recognize a higher-order pattern involving another kind of function composition, 
which is defined by / 02 5 := Xxy. f{gxy). 

The two key remarks are: 

VP kdata P = Aa. Xk:string -i-a. ko 2 {r_X P) (16) 

and 



Vff : Set^Set V/: kt tf VP {kdata P) @ f = Aa. Xk. (/ a k) 02 (r_A P) (17) 
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where = stands for convertibility. We can inline these identities in order to get 
versions of r_sprintf and kformat which are convertible with the original ones. 

Fixpoint r_sprintfl (j) : strings type-transf-of-fmt(j) string := 
match (j) with 
I Stop Xa. a 

I DataPcj) {r^sprintfl (j>) 02 (r^X P) 

I Lits(j) {r_sprintfl 4>) o (Aa. a" s) 
end. 

Fixpoint kformatl (j): kt {type-transf-of-fmt (j>) := 
match (j) with 
I Stop => kid 

I DataPcj) Aa. Xk. {kformatl cj) a k) 02 {r_X P) 

I Litxcj) Aa. Xk. {kformatl cjiak) o {Xa. a'x) 

end. 

Using them, we can prove that: 

V(/) r_sprintf (j) = kformat cj) string {Xs. s) (18) 

by a straightforward induction over cj>, and we get the desired theorem in the 
same way as in section 4. 

Theorem sprintf^sprintfk: Mcj) sprintf <j) = sprintfk {kformat cj>). 
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Abstract. We present in this paper the development of a decision pro- 
cedure for affine plane geometry in the Coq proof assistant. Among the 
existing decision methods, we have chosen to implement one based on the 
area method developed by Chou, Gao and Zhang, which provides short 
and “readable” proofs for geometry theorems. The idea of the method 
is to express the goal to be proved using three geometric quantities and 
eliminate points in the reverse order of their construction thanks to some 
elimination lemmas. 



1 Introduction 

Geometry is one of the most successful areas of automated theorem proving. 
Many difficult theorems can be proved by computer programs using synthetic 
and algebraic methods. A decision procedure using quantifier elimination was 
first introduced by A. Tarski [15]. His method was further improved by 
Collins’ cylindrical decomposition algorithm [4[. Among the efficient methods 
we can cite also the algebraic method of Wu which succeeded in finding the 
proofs of hundreds of geometry theorems [3, 18] and later the method of Chou, 
Gao, Zhang which produces short and readable proofs [2] (they are readable 
in the sense that one can understand these proofs without difficulty as they 
manipulate small terms.) 

Recently, developments have also been produced towards the formalization 
of elementary geometry in proof assistants: Hilbert’s Grundlagen ]10] have been 
formalized in Isabelle/Isar by Laura Meikle and Jacques Fleuriot ]14], and by 
Christophe Dehlinger in the Coq system ]5]. Gilles Kahn has formalized Jan von 
Plato’s constructive geometry in the Coq system [12, 17]. Frederique Guilhot has 
done a large development in Coq dealing with French high school geometry ]7[. 

We believe that automated theorem proving and interactive proof develop- 
ment are complementary to formal proof generation. Proof assistants can deal 
with a very large span of theorems, but they need automation to ease the devel- 
opment. The goal of this work is to bring the level of automation provided by the 
method of Chou, Gao and Zhang to the Coq proof assistant [13]. This is done 
by implementing the decision procedure as a Coq tactic. A tactic is a program 
which expresses the sequence of the basic logical steps needed to formally prove 
a theorem. 
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Formalizing a decision procedure within Coq, has not only the advantage of 
simplifying the tedious task of proving geometry theorems but also allows us to 
combine the geometrical proofs provided by the tactic with arbitrary complicated 
proofs developed interactively using the full strength of the underlying logic of 
the theorem prover. For instance, theorems involving induction over the number 
of points in the figure can be formalized in Coq. This approach has also the 
advantage of providing a higher level of reliability than ad hoc geometry theorem 
provers because the proofs generated by our tactic are double checked by the 
Coq internal proof-checker. 

The issues related to the treatment of nondegeneracy conditions are crucial; 
this is emphasized in our formalization. 

This paper is arranged as follows: we will first give an overview of the decision 
method, and then we will explain how it has been implemented in the Coq proof 
assistant. 

2 The Chou, Gao and Zhang Decision Procedure 

Chou, Gao and Zhang’s decision procedure is the mechanization of the area 
method. It is a mix of algebraic and synthetic methods. The idea of the method is 
to express the goal in a constructive way and treat the points in the reverse order 
of their construction. The treatment of each point consists in eliminating every 
occurrence of the point in the goal. This can be done thanks to the elimination 
lemmas. 

To be in the language of the procedure, the goal to be proved must verify 
two conditions: first the theorem has to be stated as a sequence of constructions 
(constructing points as intersections of lines or on the parallel to a line passing 
through a point, etc.)^. Second, the goal must be expressed as an arithmetic 
expression using only three geometric quantities: the ratio of two oriented dis- 
tances (^) with AB parallel to CD, the signed area of a triangle (Sabc) and 
the Pythagoras difference (the difference between the sum of the squares of two 
sides of a triangle and the square of the other side Vabc = AB +BC —AC )^. 
These three geometric quantities are sufficient to deal with a large part of plane 
geometry as shown in Table 1 on page 228. They verify elementary properties 
such as Saab = 0, Sabc = —Sbac and Sabc = Sbca- That will be made 
explicit in Sect. 3. For the time being only the first two geometric quantities are 
formalized in our development in Coq, it means that we can only deal with affine 
plane geometry. The formulas treated by our tactic are those of the form: 

V^i, . . . An ■ Point, Ci{Ao, . . . , Ap) — > . . . — > Cj{Aq, . . . , Ay) — > i? = 0 

^ Note that it is subject to conditions (that may not be decidable) and that construc- 
tive in this context does not mean the same as constructible with ruler and compass 
(this will be detailed later in Sect. 3.1). Note also that different constructions can 
lead to slightly different nondegeneracy conditions and so slightly different theorems. 
^ Note that Vabc = —2{AB.BC) and 45 asc = [AB A BC)^ where . is dot product 
and A is vector cross product. 
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where R is an arithmetical expression containing signed areas and ratios and 
Ci are predicates expressing the sequence of constructions. For each constructed 
point there is some Ci stating how it has been constructed. Note that the de- 
pendency graph of the constructions must be cycle free. 

To eliminate a point from the goal we need to apply one of the elimina- 
tion lemmas shown on Table 2 on page 229. This table can be read as follows: 
To eliminate a point Y, choose the line corresponding to the way Y has been 
constructed, and apply the formula given in the column corresponding to the 
geometric quantity in which Y is used. The lemmas rewrite any geometric quan- 
tity containing an occurrence of a point Y {Saby or == for any A,B,C and D 
such that AY |j CD.) into an expression with no occurrence of T There is one 
lemma for each combination of construction and geometric quantity. As far as 
geometry of incidence is concerned, we have five ways to construct a point and 
two geometric quantities; this provides ten elimination lemmas. Note that there 
are more constructions than needed (some constructions can be expressed using 
others) . This is used to simplify the statement of the theorems and shorten the 
proofs by providing specific elimination lemmas for non primitive constructions. 
The constructions involving a quantity A can be used to build a point at some 
fixed distance (if A is instantiated) or at any distance (if A is kept as a variable) . 
These last constructions are used to build what are called “semi-fixed points”. 

When all the constructed points have disappeared from the goal, the result is 
an arithmetic expression containing geometric quantities using only free points 
(free points are those that can be freely moved in the plane, those whose position 
can be arbitrarily chosen while drawing the figure) . At this step these geometric 
quantities use only free points but are not necessarily independent. In case these 
geometric quantities are not independent we decompose them using three non 
collinear points (that can be seen as a base). This will be detailed in Sect. 3. If 
all the geometric quantities are independent, the goal can be seen as an equation 
between two polynomials, which can be easily decided. 

The steps of the method can be summarized using this informal description: 

— express the goal in a constructive way (as a sequence of basic constructions) 
using only the three geometric quantities; 

— remove bound points from the goal using the elimination lemmas; 

— change the goal into an expression containing only independent geometric 
quantities; 

— decide if the resulting equality is universally true or not. 



2.1 Example 

Let’s consider the midpoint theorem as an example: 

^ Note that every occurrence of Y is removed only if the points present in the geometric 
quantity containing Y (A,B,C and D) are different from Y , this problem is treated 
in the implementation. 
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Table 1. Expressing some common geometric notions using 5, ratios and V 



Geometric notions 


Formalization 


A,B and C are collinear 


Sabc = 0 


AB II CD 


Sabc = Sabd 


I is the midpoint of AB 


= = 2 A Sabi = 0 


AB ± BC 


Vabc = 0 


AB T CD 


Vacd ~ Pbcd 


A = B 


Vaba = 0 



Example 1 (Midpoint theorem). Let ABC be a triangle, and let A' and B' be 
the midpoints of BC and AC respectively. Then the line A' B' is parallel to the 
base AB. 

Proof (using the method). We first translate the goal c 

{A'B' II AB) into its equivalent using the signed area: 



Sa'b'a = Sa‘ 



B'B 




Then we eliminate compound points from the goal start- 
ing by the last point in the order of their construction. 

The geometric quantities containing an occurrence of 

B' are Sa'B'b and Sa'B'A, B' has been constructed using the first construction 
on Table 2 with A = | : 

Sa'B'A = SaA'B' = -^SaA'A + -j^SAA'C = -^Saa'C 



and 



The new goal is 



Sa'B'b = Sba'B' = -^Sba'a + ^Sba'c 



Saa'C = Sba'A + Sba'C 



Now we eliminate A' using: 



ScAA' = ^ScAB + ^ScAC = ^^CAB 
Saba' = -^Sabb + -^Sabc = -^Sabc 
S cBA' = -jScBB + -j^ScBC = 0 



The new goal is: 

The proof is completed as Scab = Sabc- 



^ScAB = -^Sabc 



Table 2. Elimination lemmas 
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Sabcd is a notation for Sabc + Sacd- 
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3 Implementation in Coq 

The formalization of the procedure consists in choosing an axiomatic, proving the 
propositions needed by the tactic and writing the tactic itself. These three steps 
are described in this section, but to ease the development, in our implementation 
we have intermixed the proofs of the propositions and the tactics. Our tactic is 
decomposed into sub-tactics performing the following tasks (we will give their 
precise description later): 

— initialization; 

— simplification; 

— unification; 

— elimination; 

— conclusion. 

The simplification and unification tactics are used to prove some propositions 
needed by the other sub-tactics. 

Our tactic is mainly implemented using the Ltac language included in the 
Coq system. This language provides primitives to describe Coq tactics within 
Coq itself (without using Ocaml, the implementation language of Coq). But 
some of our sub-tactics are implemented using the reflection method [11,9,1]. 
This method consists of reflecting a subset of the Coq language (here the arith- 
metical expressions build on the geometric quantities) into an object of the Coq 
language itself (in our case an inductive type denoting arithmetical expressions). 
This means that the computation performed by the traditional tactic in some 
metalanguage (Ltac or Ocaml) is here done using the internal reduction of Coq. 
The reflexive tactic is composed of: 

— a small piece of Ltac (or Ocaml) to reflect the object language into the 
metalanguage, 

— a Coq term which solves the problem expressed in the metalanguage, 

— a Coq term which reflects the metalanguage into the object language, 

— and the proof of the validity of the transformation performed by this term. 

This method has the advantage of producing more efficient tactics and shorter 
proofs because the application of the tactic is just one computation step (using 
the conversion rule of the calculus of inductive constructions). 

We have used the reflection method to implement the simplification and 
unification tactics. We have not chosen to use the reflection method for the 
whole tactic for two reasons: 

1 . We believe that the proof process would not be much faster and the generated 
proof would be comparable in size. Indeed the proofs generated by our tactic 
are roughly a sequence of the few applications of the elimination lemmas. 

2. Expressing the tactic as a Coq term, and proving the validity of this trans- 
formation would have been cumbersome. We make heavy use of the high 
level primitives provided by the Ltac language such as matching the con- 
text for terms or sub-terms, clearing hypotheses etc. All this machinery and 
the proof of its validity would have to be developed within Coq to use the 
reflection method on the whole tactic. 
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3.1 The Axiomatic 

There are many axiomatics for elementary geometry. The best known are the 
axiomatics of Euclid, Tarski and Hilbert [6, 16, 10]. The axiomatic used to for- 
malize this decision method in Coq is inspired by the axiomatic of Chou, Gao 
and Zhang and is given in Table 3 on the following page. We could define this 
axiomatic as an undirected, semi-analytic, axiomatic with points as primitive 
objects. We mean by these adjectives that: 

~ This axiomatic has the property of being unordered, this simplifies the treat- 
ment of a lot of cases but it has the drawback that one cannot express the 
Between predicate which can be found in Tarski^. 

— This axiomatic contains the axioms of a field. This means that there is some 
notion of numbers, but it is still coordinate free. 

— This axiomatic has the characteristic of being based on points: lines are not 
primitive objects as in Hilbert’s axiomatic for example which contains not 
only Points but also Lines as primitive objects. This means that we can not 
quantify over the set of lines, etc. 

The first axiom is the fact that we have a set of points. 

We assume that we have a field of characteristic different from two. The 
axioms of a field are standard and hence omitted. The fact that the characteristic 
is different from two is used first to simplify the axiomatic (because 2 0 A 

Sabc = —Sbac Saac = 0) ^nd second to allow the construction of the 
midpoint^ of a segment without explicitly stating that two is different from 
zero. 

We assume that we have one binary function {AB) and one ternary (Sabc) 
from points to our field (F) . The first depicts the signed distance between A and 
B, the second represents the signed area of the triangle A,B,C. 

The axioms of dimension express that all points are in the same plane and 
not all points are collinear. 

The axioms of construction express that we can build a point on a line deter- 
mined by two points A and B at some given distance. The given distance is not 
necessarily a “constructive distance” so the notion of theorems stated construc- 
tively is not the same as the notion of constructible with ruler and compass. The 
constructed point is unique if A is different from B. 

The axiom of proportions is central and gives a relation between oriented 
distances and signed areas. 

Our axiomatic differs in some points with the axiomatic of Chou, Gao and 
Zhang. 

This issue can be addressed using an ordered field. In this case we can express the 
Between predicate and we could generalize the procedure to allow the treatment of 
a goal which is an inequality. But the procedure could still not deal with inequalities 
in the hypotheses. 

® The midpoint is given such attention because it is used to prove the validity of 
constructions involving parallel lines. 
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Table 3. The axiomatic 



Points 

Field 



Point : Set 
F is a field 



2/0 



Signed distance 



• : Point ^ Point ^ F 
AB ^0 A^B 



S : Point ^ Point ^ Point ^ F 
Signed area Sabc = Scab 
Sabc = — Sbac 



Chasles’axiom Sabc = 0 — *■ AB + BC — AC 



Dimension 



3 A, B ,C : Point, Sabc / 0 
Sabc = Sdbc + Sabc + Sabd 



Construction 



Vr : F 3P : Point, Sabp = 0 A AP = rAB 
^ ^ A Sabp = 0 A AP = rAB p _ 

^ A Sabp' = 0 A AP' = rAB ^ 



Proportions A / C ^ Spac / 0 ^ Sabc = 0 ^ 




First we do not assume that we have a notion of collinearity, this notion is 
defined using the signed area. In [2] the notion of collinearity of three points 
A,B and C is used to express some axioms and then proved to be equivalent to 
Sabc = 0. 

Second we can divide arbitrary distances, whereas Chou’s axiomatic restricts 
to ratios of oriented distances == where the lines AB and CD are parallel. 
The coherence is preserved because the oriented distance can be interpreted by 
the standard analytic model. The fact that we can divide arbitrary distances 
means that to give an interpretation to the distance function we have to give an 
orientation to the lines of our plane. 

But the decision procedure requires explicitly that for every ratio of oriented 
distances ==, AB is parallel to CD. Our lemmas used in the procedure state 
explicitly that the lines are parallel. This means that in our formalization one 
can write the ratio of two arbitrary distances but it cannot be dealt with by 
the decision procedure. This choice of formalization implies that the decision 
procedure is not complete (the goals which are not in the language of the tactic 
will be rejected). 

This formalization is more convenient because it is more general and it allows 
to use an “ordinary” field, and use the standard tactic dealing with fields provided 
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by Coq. Otherwise we would have had to give some axioms and prove some 
properties concerning the link between the ratio function and products, sums, 
etc. It also allows us to manipulate ratios of distances which are supported by 
parallel lines without explicitly stating that these lines are parallel. This is useful 
sometimes. For example with the same assumptions as in the midpoint theorem, 
if we want to state that | we do not want to add an assumption stating 

that A' B' II AB because it is a consequence of other assumptions. As a result of 
this choice, two invariants must be kept along the proof: 

1. for each denominator of a fraction there is a proof in the context that it is 

different from zero 

2. for each ratio of oriented distances == there is a proof in the context that 
AB is parallel to CD 



3.2 Propositions Needed by the Tactic 

Here is a quick overview of the propositions that have been proven using the 

Coq system for this development: 

Basic propositions are used to rewrite geometric quantities, etc. these are the 
very basic propositions used by unification and simplification tactics, they 
come very early in order to take advantage of the unification and simplifica- 
tion tactics as soon as possible. 

Lemmas are used in the whole development. 

Construction lemmas are used to prove that each construction shown on Ta- 
ble 2 is a consequence of the axiom of construction. 

Constructed points elimination lemmas are used to eliminate fixed points 
and preserve our invariants. 

Free points lemmas are used to express geometric quantities using indepen- 
dent variables. 



3.3 The Tactic Itself 

We give in this section a detailed description of the sub-tactics we use. 



Initialization Tactic 

1. The initialization tactic (called geoinit) checks that the goal is compatible 
with the decision procedure. (This includes verification that the invariants 
are initially true.) 

2. It unfolds all the definitions which are not treated directly by the decision 
procedure, (for example midpoint is expanded as a ratio of distances, and a 
statement expressing the collinearity) 

3. It introduces all the hypotheses in the context. 

4. It decomposes the logical part of the goal if needed, (split the conjunctions 
and decompose the compound constructions) 
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Example 2. The midpoint theorem is stated using our language [8] in the syntax 
of Coq V8.0 as follows: 

Theorem midpoint_A : 

forall A B C A’ B’ : Point, midpoint A’ B C -> midpoint B’ A C -> 
parallel A’ B’ A B. 
geolnit . 

1 subgoal 
A : Point 
B : Point 
C : Point 
A’ : Point 
B’ : Point 

H : on_line_d A’ B C (1 / 2) 

HO : on_line_d B’ AC (1/2) 

S A’ A B’ + S A’ B’ B = 0 

on_line_d A’ B C (1/2) states that A' is on line BC and = i. 

Simplification tactics. The simplification tactic (basic_simpl) performs ba- 
sic simplifications in the hypotheses and the goal. Note that we need to perform 
exactly the same simplifications in the goal and hypotheses in order to preserve 
our invariants. For instance if the denominator of a fraction is simplified, the 
same simplification must be applied to the proof that this denominator is non- 
zero otherwise we lose the invariant that we have a proof that every term which 
syntactically occurs in the denominator of a fraction is non-zero. 

Basic simplifications consist in: 

— removing degenerated directed distances or signed areas (e.g. ==, Saab- ■ ■ ) 

— rewriting —(—a;) into x 

— rewriting —0 into 0 

~ rewriting 0 * a: and a; * 0 into 0 

— rewriting 1 * a; and a; * 1 into x 

— rewriting a; -P 0 and 0 -P a; into x 

This tactic is necessary to keep the goal as small as possible. Not simplify- 
ing the goal at each step would lead to huge terms. Examples show that the 
computation becomes intractable without simplification. 

Unification tactics. (unify_signed_areas,unify_signed_distcuices) 

There are two unification tactics, one for each geometric quantity. The uni- 
fication tactics change the goal and hypotheses in order to unify the geometric 
quantities. For instance if both AB and BA are used in the context or in the 
goal, AB is changed into —BA^. 

The choice to rewrite AB or BA is arbitrarily made by the tactic. 
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This has two purposes: 

1. It can speed up some steps, because any rewrite of one of these quantities 
will be done only once. 

2. It is necessary that geometric quantities which are equals have the same form. 
Indeed the last step of the procedure is a call to the Coq standard field^ 
tactic on an arithmetic expression containing independent geometric quan- 
tities, and field would consider AB and BA as different variables (because 
field doesn’t know anything about the oriented distance function). 

Example 3. In this context: 



H9 


: S 


C 


A 


B 


<> 


0 






H8 


: S 


B 


A 


C 


<> 


0 






HI 


: S 


A 


B 


C 


<> 


0 






S 


P B 


C 


/ 


S 


A E 


C + SPAC/SBAC + SPAB/SCAB = 1 



the tactic unify_signed_areas changes the goal into: 

H8 : - S A B C <> 0 
HI : S A B C <> 0 



SPBC/SABC + SPAC/-SABC + SPAB/SABC=1 



Elimination tactic. This tactic (called eliminate_all) first searches the con- 
text for a point which is not used to build another point (a leaf in the dependency 
graph). Then for each occurrence of the point in the goal, it applies the right 
lemma from Table 2 by finding in the context how the point has been constructed 
and which geometric quantity it appears in. Finally it removes the hypotheses 
stating how the point has been constructed from the context. 

Note that some lemmas have a side condition to their application, in this 
case a recursive call on the whole tactic is done. If the condition is true then the 
lemma is applied, in the other case we need to do a step of classical reasoning: 
we reason by cases on the side condition. The formalization in Coq emphasizes 
the use of this classical reasoning step. As noted before, the elimination lemmas 
given in Table 2 on page 229, do eliminate an occurrence of a point Y only 
if Y appears only one time in the geometric quantity {A,B,C and D must be 
different from Y). If T appears twice in S, this is not a problem because then 
the geometric quantity is zero, and so already eliminated by the simplification 
phase. But if Y appears twice in a ratio (for instance in ==) this is a special 
case which needs to be treated apart. This is done in the implementation. 



^ field is a reflexive tactic included in the distribution of Coq. It decides equality on 
any field defined by the user. 
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Example 4- In this context: 

1 subgoal 
A : Point 
B : Point 
C : Point 
A’ : Point 
B’ : Point 

H : on_line_d A’ B C (1 / 2) 
HO : on_line_d B’ A C (1 / 2) 



S A’ A B’ + S B A’ B’ = 0 



the tactic eliminate B’ changes the goal into: 



1 


subgoal 






A 


: Point 






B 


: Point 






C 


: Point 






A’ 


: Point 






B’ 


: Point 






H 


: on_line_d A’ B C (1 


/ 2) 




1 


/2*SA’AC+(1- 


1 / 2) * S A’ 


A A + 


(1 


/2*SBA’C+(1 


- 1 / 2) * S B 


A’ A) 



Free point elimination tactic. This tactic supposes that the goal is an ex- 
pression using geometric quantities involving only free points (every constructed 
point has already been eliminated by the elimination tactic). The role of this tac- 
tic is to change the goal into an expression involving only independent variables. 
Geometric quantities involving free points are not necessarily independent, they 
are bound by the following relation: 

Sabc = Sdbc + Sadc + Sabd 



But geometric quantities involving free points can be transformed into a bunch 
of independent variables by expressing them with respect to a base. For that 
purpose we choose three arbitrary non collinear points 0,U and V and we use 
the following lemma to rewrite geometric quantities containing more than one 
point which is different from the base points: 



Sow 7 ^ 0 — > Sabc 



Sou A SovA 1 
SouB SovB 1 
Souc Sovc 1 



If there are three points in the context which are known to be not collinear, 
we use them as the base O, U, V. Otherwise we build three non collinear points 
thanks to the dimension axiom. 
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Conclusion tactic. When this tactic is called, the goal is an expression of 
independent variables. If the rational equality is universally true, the theorem is 
proved. Otherwise there is some mapping from the variables to the field which 
make the equality false, and this provides a counter-example to the goal. To check 
if the equality is universally true, this last tactic applies the standard Coq tactic 
field to solve the goal and then solves the generated subgoals (stating that the 
denominators are different from zero) using the hypotheses and/or decomposing 
them using the fact that A& the field tactic does 

not provide counter-examples, our tactic is not able to give counter-examples 
either. This is just a technical limitation. This tactic is small enough to be fully 
explained: 

Ltac f ield_and_conclude := 

abstract (field; repeat (assumption I I apply nonzeromult) ; geometry). 

This tactic does a call to field®, and tries to apply one of the assumptions 
to the generated subgoals. If it fails, it decomposes the product in the goal and 
solves the subgoals using geometry. This last tactic is able to solve common 
goals such as AB ^ 0 when the fact that A is different from B is one of the 
hypotheses. The abstract tactic is here for technical reasons: this Coq tactic 
speeds up the typing process by creating a lemma. 

3.4 A Full Example 

Example 5. In this section we give a detailed description of how the tactic works 
on the first example by decomposing the procedure into small steps®. 

forall A B C A’ B’ : Point, midpoint A’ B C -> midpoint B’ A C -> 
parallel A’ B’ A B. 

At this step it would be enough to type autogeom to solve the goal using our 
decision procedure, but for this presentation we mimic the behavior of the deci- 
sion procedure using the sub-tactics described in the previous sections. We give 
the name of the sub-tactics on the left, and Coq output on the right 

geolnit. H : on_line_d A’ B C (1 / 2) 

HO : on_line_d B’ A C (1 / 2) 



S A’ A B’ + S A’ B’ B = 0 



“ The tactic field from Coq version 8.0 is very slow at solving some goals, the reason 
Is that the field tactic is based on another simplification tactic called ring that 
is very slow at computing with constants on abstract domains such as our field 
of measures. We incidentally had to reimplement a version of ring that computes 
using binary numbers in order to be able to compute efficiently the last phase of our 
decision procedure for geometry. 

® These steps are not exactly the same steps as those executed by our automatic 
procedure (the automatic procedure may treat the points in another order, and 
perform more simplification and unification steps). 

For this presentation the fact that A, B, C, D and E are of type point has been 
removed from the context. 
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eliminate B’ . h : on_line_d A’ b c ci / 2) 

1/2*SA’AC+(1-1/2)*SA’AA+ 

C1/2*SBA’C+C1-1/2)*SBA’A)=0 

basic_simpl. H : on_line_d A’ B C Cl / 2) 



1/2*SA’AC+(1/2*SBA’C+1/2*SBA’A)=0 

eliminate A’. ============================ 

1/2*(1/2*SACC+C1-1/2)*SACB) + 
(1/2*C1/2*SCBC+(1-1/2)*SCBB) + 
1/2*(1/2*SABC+(1-1/2)*SABB))=0 



basic_simpl . 



unif y_signed_areas . 



1/2*(1/2*SACB) + 



1/2*(1/2*SACB) + 



/2* (1/2*SABC)=0 



/2* (l/2*-SACB)=0 



f ield_cind_COnclude . Proof completed. 



4 Future Work 



This development can be extended in two directions: treat more geometrical 
notions and adapt this work to other axiomatic systems or formal developments. 
The first direction is straightforward and consists in extending the approach 
presented in this paper to deal with circles, perpendiculars, vectors, complex 
numbers and spatial geometry as shown in the book of Chou, Gao and Zhang. 
To achieve this goal, our tactic can easily be adapted, we only need to prove 
the construction and elimination lemmas corresponding to the new geometric 
quantities (for example the pythagoras difference) and update the unification 
and simplification tactics. 

The second direction consists in building bridges to other formalizations of 
geometry (using different axiomatics) . Preliminary work has been done towards 
the integration of our tactic with Frederique Guilhot’s Coq development dealing 
with high school geometry. This integration would open the door to pedagogi- 
cal applications. Involving a student in the process of formally proving a basic 
geometry theorem is not an unreachable goal if he is saved from the burden of 
solving some technical goals thanks to our automatic tactic (for instance goals 
dealing with nondegeneracy conditions). We have initiated a discussion in order 
to define a common language for stating formal geometry theorems [8] . Although 
the logic used to formalize elementary geometry is very simple, the problem of 
defining a common language is not trivial. Indeed, different axiomatics can lead 
to different yet natural definitions for the same informal object. For instance the 
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common notion of collinearity in a vector-based approach {A, B, C are collinear 
if 3k, AB = k.AC) is different from the notion of collinearity in our development. 



5 Conclusion 

We have shown in this paper how automatic theorem proving can be combined 
with interactive proof development in the framework of the Coq proof assistant. 
Our implementation gives an example of how the tactic language of Coq (Ltac) 
and the reflection mechanism can be jointly used to build a somewhat short 
development of a tactic (6500 lines) without sacrificing the efficiency (our im- 
plementation within Coq is slower than the original but 20 examples including 
the well-known theorems of Ceva, Menelaus, Pascal and Desargues are proved 
in a couple of minutes). This formalization at the same time emphasizes the 
role of nondegeneracy conditions and provides a way to get rid of them. Our 
formalization also clarifies the usage of classical reasoning. 



Availability. This development is available at: 

http: //www. lix. polytechnique . fr/~jnarboux/ChouGaoZhcing/index.html 
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Abstract. This work describes the proof and uses of a theorem allowing defini- 
tion of recursive functions over the type of X-calculus terms, where terms with 
bound variables are identified up to a-equivalence. The theorem embodies what 
is effectively a principle of primitive recursion, and the analogues of this theo- 
rem for other types with binders are clear. The theorem’s side-conditions require 
that the putative definition be well-behaved with respect to fresh name generation 
and name permutation. A number of examples over the type of X-calculus terms 
illustrate the use of the new principle. 



1 Introduction 

Theorem-proving tools have long supported the definition of (potentially recursive) al- 
gebraic or inductive types. Not only do the tools prove the existence of such types, and 
establish them within the logical environment, but they also provide methods for defin- 
ing new functions over those types. Typically this is done by proving and using the new 
type’s recursion theorem. 

For example, a definition of a type of lists would assert that the new type had two 
constructors: nil and cons, and that the cons constructor took two arguments, an ele- 
ment of the parameter type a, and another list. The recursion theorem proved to accom- 
pany this type would state: 

Vn c. 3h. 

A(nil) =n A 

Wat. h{cons{a,t)) = c{a,t,h{t)) 

This theorem states that given any n, specifying the value of the function-to-be when ap- 
plied to empty lists, and given any c, specifying what should happen when the function 
is applied to a “cons-cell”, there exists a function h that exhibits the desired behaviour. 
The cons behaviour, c, may refer to the component parts of the list, a and t, as well as 
the result of h’s action on t. 

For example, the existence of the map function can be demonstrated by instantiating 
the recursion theorem so that n is Xf. nil, and c is X{a, t, r) f. COns(/(a),r(/)). Note 
how map’s additional parameter has been accommodated by making the range of the 
function h itself a function-space. 

K. Slind et al. (Eds.): TPHOLs 2004, LNCS 3223, pp. 241-256, 2004. 

© Springer- Verlag Berlin Heidelberg 2004 
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In the friendlier world that users expect to inhabit, the user provides a definition for 
map that looks like 

map / nil = nil 

map / (cons(/z, t ) ) = cons( f(h) , map / f) 

It is then the responsibility of the tool implementation to recognise that this is a prim- 
itive recursion over a list, to instantiate the recursion theorem appropriately, and to 
manipulate the resulting theorem so that it again looks like what the user specified. 
If, for example, the tool instantiates the theorem as above, it proves the existence of a 
map function with its parameters in the wrong order; a little more work is required to 
demonstrate the existence of the function that takes its function parameter first. 

Finally, note that when a multi-parameter function’s other arguments do not change 
in the recursive call (as happens with map, but not, for example, with foldl), it is also 
possible to instantiate the theorem differently, but to the same ultimate effect. In the 
case of map, n would be set to nil, and c to X(a,t, r). COns(/(a), r). The resulting 
instantiation of the recursion theorem would have / free. This could be generalised, 
giving: 

V/. 3h. 

/z(nil) = nil A 

Vai. h{cons{a,t)) = cons{ f (a), h{t)) 

An appeal to the Axiom of Choice' (skolemisation) then moves the variable h out over 
the universally quantihed /, demonstrating the existence of an h taking two parameters. 
This trick is not necessary with primitive recursive functions over normal inductive 
types, but it will be useful in some of the examples below. 

This much is well-understood technology. Unfortunately, there is no comparable, sim- 
ple story to be told about the type of X-calculus terms where bound variables are iden- 
tified up to a-equivalence (or indeed, any type featuring a-equivalence). Section 7 dis- 
cusses other approaches to this problem. Presented here is a new approach, based on two 
significant sources: Gordon and Melham’s characterisation of a-equivalent X-calculus 
terms [5], and the Gabbay-Pitts idea of name (or atom) permutation as the basis for 
syntax with binding [4] . 

Gordon and Melham’s work is a significant starting point because it defines a type 
in HOL (classical simple type theory) exactly corresponding to the type of (untyped) X- 
calculus terms, augmented with a “constant” constructor allowing the injection of any 
other arbitrary type into the terms. It corresponds to a type that one might declare in 
SML as 

datatype ' a term = CON of ' a 

I VAR of string 
I APP of 'a term * 'a term 
I LAM of string * 'a term 

except that the bodies of abstractions (under the LAM constructor) are identified up to 
a-equivalence. 

' The recursion theorem can be strengthened so that 3h turns into 3\h. Moving this out past 
universal quantifiers is then only an appeal to the Axiom of Definite Choice. 
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Accompanying this type are the core theorems and constants that identify it as an 
implementation of the X-calculus. There is a substitution function, a function for cal- 
culating free variables, and various theorems that describe how these functions behave. 
There are two further important facts about the Gordon-Melham work: 

- Despite their potentially misleading title, “Five axioms of alpha conversion”, Gor- 
don and Melham did not assert any new HOL axioms. Their type is constructed 
entirely dehnitionally, on top of a model of de Bruijn terms. 

- Their theory of terms is first-order. By a-equivalence, the following equation holds 

LAM V (VAR v) = LAM u (VAR u) 

but LAM is not a binder at the logical level, and there are no function spaces used. 
The Gordon-Melham theory is not one of higher-order abstract syntax, and there 
are no exotic terms. 

Using the CON constructor, it is straightforward to construct new types with binders 
on top of the basic Gordon-Melham terms. In earlier work [7], I implemented the types 
A' and A'* from Barendregt [2], and proved finiteness of developments and the stan- 
dardisation theorem. That work demonstrated that the Gordon-Melham theory is a vi- 
able basis for theorem-proving with binders. 

Nonetheless, in this earlier work, I had to manually dehne the new types, and almost 
all of the various functions over them. This sort of work is painful and a significant 
obstacle for many users. The current work describes technology for solving one of these 
two important problems, that of function definition. The other problem, that of defining 
new types, is another significant project in its own right. 

My second inspiration, Gabbay’s and Pitts’s ideas about permutation as a basis for 
syntax with binders, is itself an independent approach to the problem of recursive func- 
tion dehnition. It is discussed in this role in Section 7. My work attempts to take the 
permutation idea and move it into a setting where some of its fundamental assump- 
tions no longer apply. This is valuable because permutations exhibit properties, even 
in HOL’s classical simple type theory, that make them much easier to work with than 
substitutions. 

The rest of this paper is arranged as follows: Section 2 provides a series of motivat- 
ing examples, designed to illustrate a range of different problems in function definition. 
Section 3 is a discussion of how the Gordon-Melham recursion principle can be slightly 
adjusted, enabling the dehnition of a size function. Section 4 describes how a permuta- 
tion or swap function can be dehned using the same principle. Section 5 then presents 
the derivation of the hnal recursion principle. Section 6 describes how the principle 
forms the basis for an automatic tool for performing function dehnition, and how it 
copes with the examples of Section 2. 1 discuss related work in Section 7, and conclude 
in Section 8. 

2 Motivating Examples 

The following functions, with their increasing complexity, provide a test for any prin- 
ciple of function dehnition. They are presented here in the form in which users would 
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want to write them, mimicking how one might write them in a functional language with 
pattern-matching. 

Each of the given functions respects the a-equivalence relation. A clause of the form 

/(lam V t) = e 

has equal values E for every possible renaming of the bound variable v (possibly subject 
to side-conditions on the equation, see below). If /(LAM v t) were E, but /(LAM u 
(t[vi— > m])) were E', and these expressions had different values, then this would be 
a contradiction: the two input terms are equal, so their /-values must be equal too. 
This work’s new recursion principle embodies restrictions which ensures that the new 
functions are well-behaved in this respect. 

Case analysis: The is_app function distinguishes constructors without looking at 
their arguments. 

is_app (CON k) = F is_app (VAR s) = F 

is_app (APP t u) = T is_app (LAM v t) = F 

Examining constructor arguments: The rator function pulls apart an application 
term and returns the hrst argument. On other types of term, its value is unspecihed. 

rator (APP t u) = t 

There is a sister function, rand which returns the other argument of an APP. 
Simple recursion: The size function returns a numeric measurement of the size of a 
term. 

size (CON k) =1 

size (VAR s) =1 

size (APP t u) =1-1- size t -i- size u 

size (LAM V t) = 1 -I- size t 

Recursion mentioning a bound variable: The enf function is true of a term if it is 
in Tj-normal form. (The FV function returns the set of a term’s free variables.) 



enf 


(CON 


k) 


= T 




enf 


(VAR 


s) 


= T 




enf 


(APP 


t u) 


= enf t A 


enf u 


enf 


(LAM 


V t) 


= enf t A 










( i s_app 


t A rand t = VAR v ^ 








V G FV 


(rator t) ) 



Simple recursion (terms as range type): The (admittedly artihcial) stripe function 
replaces all CON terms with (kx.x) . 

stripe (CON k) = LAM "x" (VAR "x") 

stripe (VAR s) = VAR S 

stripe (APP t u) = APP (stripe t) (stripe u) 

stripe (LAM V t) = LAM v (stripe t) 
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Recursion with an additional parameter: Given the ternary type of possible direc- 
tions to follow when passing through a term ({Lt,Rt, In}), corresponding to the 
two sub-terms of an APP constructor and the body of an abstraction, return the set 
of paths (lists of directions) to the occurrences of the given free variable in a term. 



v_posns V (VAR s) 
v_posns V (CON k) 
v_posns V (APP t u) 



V X ^ 

V posns V (LAM X t) 



= if s = V then {[]} else 0 

= 0 

= (IMAGE (CONS Lt) (v_posns v t) ) 

U 

(IMAGE (CONS Rt) (v_posns v u) ) 
= IMAGE (CONS In) (v_posns v t) 



The IMAGE (GONS x) construction above takes a set and adds x to the front of 
all its elements (which are all lists). After this definition is made, it is easy to prove 
(by induction) that 



V ^ EV(t) ^ v_posns V f = 0 



Another useful LAM clause immediately follows: 

V posns V (LAM V t) =0 

One advantage of the new recursion principle is that it automatically derives the 
side-condition attached to the LAM-clause above, necessary to make it valid. 
Recursion with varying parameters (terms as range): A variant of the substitution 
function, which substitutes a term for a variable, but further adjusts the term be- 
ing substituted by wrapping it in one application of the variable " f " per binder 
traversed. 



sub' M V (VAR s) = if V = s then M else VAR s 

sub' M V (CON k) = CON k 

sub' M V (APP t u) = APP (sub' M V t) (sub' M v u) 



V 7^ X A "f" X A X ^ FV(M) ^ 
sub' M V (LAM X t) = 

LAM X (sub' (APP (VAR "f") M) v t) 

Again, the preconditions on the LAM-clause in this example ensure that the func- 
tion respects a-equivalence. This function can be given another clause for the LAM 
constructor in the same way as for v_posns above, giving 

sub' M V (LAM V t) = LAM v t 

Even with this addition, the equations may not seem to provide a complete spec- 
ification of the behaviour of the function. What, for example, is the behaviour if 
the bound variable is " f " ? In fact, the function is well-defined, but its value may 
need to be calculated by first a-converting an abstraction to use a new bound vari- 
able. That this is always possible is guaranteed by the new recursion principle: it 
requires that there be only finitely many names to which a bound variable can not 
be renamed. Here, the unavailable names are v, " f " and the nmaes in FV(m). 
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3 The Recursion Principle: First Steps 

One of the Gordon-Melham theorems characterising the type of X-calculus terms is the 
following principle of recursion (where t\v^ u\ is a capture-avoiding substitution of a 
term u for a variable v throughout term t)\ 

Wcon var app lam. 

3hom. 

(V k. hom{CON k) = con{k)) A 

(V hom{YKR s) = var{s)) A (1) 

(V t u. hom{APP 1 u) = app (horn t) {horn u) t u) A 
(V V t. hom{hAM. V t) = 

lam (Xy. hom{t[v i— > VAR(y)])) (Xy. t[v i— > VAR(y)])) 

This differs from the usual form of a recursion theorem in the clause for LAM. The lam 
function is not passed the result of a recursive call, but a function instead. This function 
takes a string, substitutes it for the bound variable through the body, and returns the 
result of the recursive function applied to this. Similarly, rather than getting access 
to the body of the abstraction directly, lam only gets to see it hidden behind another 
function that performs a substitution. 

The last of the Gordon-Melham “axioms” states the existence of a function ABS 
such that 

LAM V t = ABS(Xy. t[v ^ VAR(y)]) (2) 

Now instantiate lam of (1) with 

Xf g. letz = NEW(FV(ABS(g))UX) in lam' {f z) z (gz) 

The NEW function takes a finite set of strings, and returns a string not in that set. 

Using (2), the last clause of the recursion theorem becomes 

Vv t. hom(LAM V t) = 

letz = NEW(FV(LAM V t)UX) in (3) 

lam' (hom(t[v !—>■ VAR{z)])) z (t[v VAR(z)]) 

This introduces two new free variables into the theorem: lam', which now gets direct ac- 
cess to the result of a recursion, a bound variable and a term body; and X, an additional 
set of variables that is to be avoided in the choice of z. 

This is a generalisation of the technique that Gordon and Melham use in [5] to 
dehne their Lgh (“length”) function (similar to size in Section 2 above). The extra Z 
parameter will be vital in defining permutation in Section 4 below. It is also Important 
to have access to the new name z, which stands in for a bound variable that has been 
renamed to be fresh. Though the new recursion theorem has made the types involved in 
the LAM clause slightly more palatable, the recursive call in the LAM clause is still over 
a term that has had a substitution applied to it. 

In the case of size (as done in [5]), it is possible to separately prove by induction 
that s i z e is invariant under variable renamings, that s i z e(f [v VAR(y)] ) = s i z e(f ) . 

This simplifies the LAM-clause so that the reference to the fresh z can disappear. This 
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trick is not strong enough in general. It doesn’t work for the stripe example func- 
tion, as it is not the case that stripc(t[v VAR(y)]) = stripc(t). Still, the size 
function is needed to perform induction on the size of terms, and so this first attempt at 
a definitional principle is used to define the size constant. This preliminary principle 
also helps with the definition of swap (see below), before being discarded. 

4 Permutation in HOL 

Next, the system must be extended with definitions of name permutation for all of the 
relevant types^. Because variables in the X-calculus terms are of type string, names are 
taken to be strings. The basic action of permutation on strings is simple to define: 

swapstrxyj = if x = ^ then y else (if y = 5 then x else ^) 

Defining a swap function over terms requires the use of the new version of the LAM- 
clause (3) and the original principle (1), with the following instantiations^ 

var ^ Xs. VAR(swapstr X y j) 

con 1 -^ CON 

app Xrtrutu. APPrtrw 

X ^ {x,y} 

lam' Xrtvt. LAM v rt 

After generalising over x and y, and then applying the Axiom of Choice (skolemis- 
ing), this results in the following theorem: 

3 swap. Vxy. 

(Vj. swap X y (VAR s) = VAR(swapstr x y j)) A 
(VA:. swap x y (CON k) = CON k) A 

(Vt u. swap X y (APP t u) = APP (swap x y t) (swap x y m)) A (4) 
(Vv t. swap X y (LAM v t) = 

letz = NEW(FV(LAM V t)u{x,y}) in 
LAM z (swap X y (t[v VAR(z)]))) 

This suffices as a definition for a new constant swap, but the LAM-clause is unac- 
ceptable as it stands. It needs to be shown that 

swap X y (lam V t) = LAM (swapstr x y v) (swap x y t) 

This can be done by first showing that swap distributes over substitutions of variables 
for variables: 

swap X y (t[w 1 -^ VAR(v)]) = 

(swap X y t)[swapstr x y VAR(swapstr x y v)] 

^ Gabbay and Pitts use the notation (xy) • t to mean the permutation of x and y in t, where x and 
y are names and t is generally of any type. In what follows, I use a wordier, but more explicit, 
notation, where each swapping function is given a different name depending on the type of the 
third argument. 

^ The rl and ru names are chosen because these parameters correspond to the results of recursive 
calls on t and u parameters respectively. 
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Glossing over some of the details, this theorem suffices because it allows the swap and 
the substitution to move past each other in the LAM-clause of (4), and for the LAM z (...) 
there to be recognised as equal (through a-equivalence) to LAM ( swap strx>>v) (...). 
The proof of (5) is by induction on the size of t. 

The next important property of swap is that it can be used instead of substitution 
when a fresh variable is being substituted for another: 

V ^ FV(t) ^ t[u^ VAR(v)] = swap u V t (6) 

This means that a-equivalence can be expressed using swap: 

V ^ FV(t) ^ LAM u t = LAM v (swap u v t) 

This much confirms that the Gordon-Melham X-calculus terms can be equipped with a 
permutation action that behaves as the Gabbay-Pitts theory requires. 

5 A New Recursion Principle 

The aim of this work is the proof of a recursion principle with a LAM clause that looks, 
as much as possible, like 

Vvt. /jom(LAM V t) = lam' {homit)) v t (7) 

How does one start with (3), that is: 

Vv t. /!om(LAM V t) = 

letz = NEW(FV(LAM V t)UX) in 

lam' (Aom(t[v VAR(z)])) z (t[v VAR(z)]) 

and derive (1)1 And what extra side-conditions need to be added to make the transfor- 
mation valid? 

A simple examination of the two formulas suggests that the desired strategy would 
be to pull out the substitutions so that there was just one, at the top-level underneath 
the let, and to then have that substitution “evaporate” somehow. The essence of the 
principle-to-come is the side-condition that allows this. 

The first observation is that permutations move around terms much more readily 
than substitutions. Secondly, the freshness of z (it is the result of a call to NEW) and (6) 
mean that the substitutions in (3) can be replaced by permutations, giving 

Vvt. /iom(LAM V t) = 

letz = NEW(FV(LAM V t)UX) in 

lam' (/zom(swap z v t)) (swapstr z v v) (swap zv t) 

To move the swap terms upwards, one would clearly need that 

/iom(swap X y t) = swap x y {hom{t)) (8) 



and that 



lam' (swap x y t]) (swapstr x y s) (swap x y t 2 ) = swap x y {lam' ti s t 2 ) 
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The final stage is getting the swap x y to “evaporate”. The obvious property to appeal 
to is 

X ^ FV(f) Ay ^ swap xy t = t (9) 

Note the abuse of notation in this discussion of strategy: the two swap functions 
in (8) have different types. On the left, swap swaps strings in a X-calculus term; on the 
right, swap is swapping strings in the result type. There are also two different swaps 
in the formula stating the desired commutativity of lam! . Finally, the swap in (9) is also 
over the result type. The final theorem has a side-condition requiring that the result type 
has swap and FV functions that behave appropriately. This notion of appropriateness is 
encoded in the swapping predicate, which specifies the properties that a permutation 
action and an accompanying free-variable function must satisfy: 

swapping sw fv = 

(Vx Z. sw X X z = z) /\ 

(Vxy z. sw xy (iw x y z) = z) A (10) 

(Vx y z. X ^/v(z) A y ^/v(z) X y z = z) A 

(Vxy z 5. 5 €fv{sw X y z) = (swapstr x y s) €fv{z)) 

The final recursion principle is presented in Figure 1. The rest of this section ex- 
plains some of its details, and comments on its proof. 

5.1 Parameters 

In the presence of additional parameters, satisfying (8) becomes more difficult. This is 
clear with the example function sub ' (and normal substitution as well). If horn is taken 
to be sub ' M v, then (8) is not true. The action of the permutations must be allowed 
to affect the parameters. In the case of sub ' , the appropriate theorem is actually 

xy^ "f " Ay "f " ^ 

swap X y (sub ' M v t) = sub ' (swap x y M) (swapstr x y v) (swap x y t) 

In general, not only does the result type of the desired function need swap and FV 
functions, but so too do any parameters. The final recursion principle explicitly ac- 
knowledges one unspecified parameter type, and both the final horn function, as well as 
the con, var, app and lam values all now take an additional parameter‘s. 

It is easy to specify a permutation action for a function type, if one has permutation 
actions for its domain and range types. This is done below in the definition of swapf n. 
Given this, one might wonder why the final recursion principle needs its explicit treat- 
ment of parameters: is it possible instead to simply require that the range of the new 
function support a permutation action, and expect the use of swapf n to specify this 
when there are extra parameters? Unfortunately, this is not possible: the problem does 
not arise in the requirement that the new function respect permutations, but rather in 
the requirement that it not generate too many fresh names (see the second block of 
antecedents in Figure 1). 

^ One parameter is sufficient: additional curried parameters can be dealt with by first showing 
the existence of an isomorphic uncurried, or tupled, version of the function. The no parameter 
case is obtained by letting the parameter type be the singleton type one, also known as unit. 
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swapping rswap rFV A swapping pswap pFV A 
FINITE X A (Vp. FINITE (pFV p)) A 

(Vfe p. rFV {con k p) C X U pFV p) A 
(Vi p. rFV {var s p) C {j} U pFV p U X) A 
(VV u' t u p. 

(Vp. rFV (V p) C FV f U pFV p U X) A 
(Vp. rFV {u' p) C FV M U pFV p U X) => 
rFV {app t' u' t u p) C FV (APP t u) U pFV p U X) A 
(W V f p. 

(Vp. rFV (V p) C FV f U pFV p U X) =► 

rFV (/am t' v t p) <Z FV (LAM v r) U pFV p U X) A 

iyk X y p. 

r0XAy0X=> 

(rswap r y {con k p) = con k (pswap r y p))) A 
(Vi X y p. 

r0XAy^X=> 

(rswap X >< {var s p) = var (swaps tr x y s) (pswap x y p))) A 
(yt t' u u' X y p. 

x0XAy^X=> 

(rswap X >> {app t' u' t u p) = 

app (swapfn pswap rswap x y t') (swapfn pswap rswap x y u') 
(swap X y t) (swap x y u) (pswap x y p))) A 
(VV t X y V p. 

x^XAy^X^ 

(rswap X y {lam t' v t p) = 

lam (swapfn pswap rswap x y t') (swapstr x y v) (swap x y t) 
(pswap X y p))) =► 

3hom . 

(Vfc p . horn (con k) p = con k p) A 

(Vi p. horn (var i) p = var s p) A 

(V? u p. horn (app t u) p = app {ham t) {horn u) t u p) A 

(Vv t p . 

V 0 X U pFV p => 

{horn (lam V t) p = lam {horn t) v t p)) A 



(V? p X y. 

x0XAy0X=> 

{horn (swap x y t) p = rswap x y {horn t (pswap x y p)))) A 
(V? p. rFV {horn t p) C FV t U pFV p U X) 

Fig. 1. The recursion principle for X-calculus terms. The second block of antecedents requires that 
the function not create too many fresh names. The third block requires that the function respect 
permutation. The second block of properties in the conclusion state that these properties do hold 
for the resulting horn function. For the definition of swapping, see (10). 
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Consider defining the substitution function, where the free names of the additional 
parameters may appear in the result. Without using the parameter information, it is 
impossible to provide a free variable function (rFV in Figure 1) for the result-type (a 
function-space) that will satisfy the new principle’s antecedents. Such an rFV must 
simultaneously return small enough sets of names to satisfy the second block of an- 
tecedents, and also have an accompanying permutation action, rswap. This permuta- 
tion action must satisfy the requirements embodied in swapping and the third block 
of antecedents. For example, the “null” instantiation, taking rFV to always return the 
empty set, in turn requires rswap to be the identity function (because of the third con- 
junct of swapping’s definition (10)), and thus fails to satisfy 

rswap X y {var s) = var (swaps tr x y s) 



where 



var = 'ksvM. if ^ = v then M else (var s) 



5.2 Proving the Theorem 

To begin the derivation of the final recursion principle, it is necessary to return to (1) 
and instantiate lam with 

Xf g p. let z= NEW(FV(ABS(g)) UpFV(p) UX) in lam' {f z) Z {g z) p 

As before, ABS(g) is equal to the original term, so that z is now fresh with respect to 
it as well as the parameter. The set of strings to avoid for p’s sake is given by the pFV 
function. The finite set X is used to avoid those free names that are somehow implicit 
in the function itself. Such a name is the "f " present in the definition of the sub' 
example. 

When the substitutions of (1) are replaced with permutations, the LAM-clause be- 
comes 



Vvr p. hom (lam V t) p = 

letz = NEW(FV(LAM V f)UpFV(p)UZ) In 

lam' (/jom(swap z v t)) (swapstr z v v) (swap zv t) p 

The strategy sketched at the beginning of this section is still the right way to pro- 
ceed, even after the complication of parameters has been introduced. Its first stage is 
to move permutations upwards in the above clause, appealing to commutativity results. 
The third block of antecedents in the final principle allow this to occur. This block 
also features the use of swapf n, which defines permutation on a function space, given 
permutation actions for the domain and range type. Its definition is 

swapf n dsw rsw x y f = Xz- rsw x y {f {dsw x y z)) 

Use of swapf n is required because the result type of hom is a function space, so that 
in expressions such a&app {hom{t)) {hom{u)) tup, the first two arguments to app are 
functions. 
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The final part of the strategy is to appeal to (9). The strategy is to have the swap 
terms in 

letz = NEW(FV(LAM V t)UpFV(/7)UX) in 

rswap z V {lam' {hom{t)) v t (pswap z v p)) 

“evaporate”. The variable v is the original bound variable of the abstraction, and the 
condition on the equation being derived requires it to not be present in the free vari- 
ables of p. Variable z shares this property by construction, so (pswap z v p) can be 
replaced by p. The rswap z v term can only be eliminated if the side-conditions in the 
theorem ensure that horn, and thus all of the helper functions, do not generate too many 
new names in the result. This is guaranteed by the second block of antecedents in the 
recursion principle. 

The proof consists of showing that the horn known to exist from the original prin- 
ciple has the properties specified in the final recursion principle. It begins by showing 
that the new function doesn’t produce too many free variables, i.e., that the theorem’s 
conclusion’s very last conjunct holds. This proof is by induction, using the original 
Gordon-Melham induction principle. Next, the commutativity result is shown (the sec- 
ond to last conjunct). This is done by an induction on the size of the term. Finally, both 
of these results are used according to the strategy described above, to prove the nice 
form of the LAM-clause. 

The course of the proof of the final recursion principle also reveals exactly what 
properties are needed of the swap and FV functions in the theorem’s two other types 
(parameter and range). These properties are defined by the predicate swapping (10) 
that appears in the final recursion principle. 

6 Application and Implementation 

The final recursion principle allows the definition of all of the functions given in Sec- 
tion 2. Further, I have implemented a tool to automatically attempt those definitions 
where there is just one parameter. This means that definitions for all the examples ex- 
cept v_posns and sub' can be made entirely automatically. The tool does not cope 
with parameterised definitions because I have yet to implement the (rather uninterest- 
ing) logic that would translate something like f x t z into / t {x, z ) , and back again, as 
required. Currently, my code also always instantiates the X parameter with the empty 
set. 

The implemented code includes a rudimentary database of types, which is used 
to provide appropriate permutation and swapping information about result types. The 
function definition tool can therefore instantiate all of the recursion principle without 
user intervention. For the simple examples, such as size, and even enf , the result 
type is one that doesn’t support a swapping action. It is easy to see that the null-swap, 
(Xv y z. z) and the everywhere-empty free variable function (kx. 0) satisfy the require- 
ments of swapping in (10). Such an instantiation also leads to immediate simplifica- 
tions in the final recursion principle itself. For example, the second block of antecedents 
completely disappears. 

After instantiation, the tool must try to discharge the side conditions. Clearly, arbi- 
trary definition attempts might produce side-conditions that no automatic tool could be 
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expected to discharge. At the moment, however, all of the one-parameter cases (those 
given above, and also all those that arose in my formalisation of the standardisation 
theorem) are handled by the tool with little more than a call to the standard simplifier, 
appropriately augmented with relevant rewrite theorems. 

While the code can not yet cope with functions of more than one parameter, it is not 
difficult to instantiate the final recursion principle by hand. For sub ' , the instantiation 
of the helper functions is as follows (where I have arbitrarily decided that the parameter 
type pairs the string and the term in that order): 

var 1 -^ (v,M). if 5 = vthenMelsevAR(5) 

con I— > Xk p. CON(fe) 

app I— > Xrt ru t u p. APP {rt p) {m p) 

lam i-> Xrt ut (v,M). LAM u {rt (v, APP (VAR("f ")) M)) 

These instantiations require no creative thought to calculate, and it is clear that an auto- 
matic tool to do this work would be straightforward to implement. 

Similarly, the existing database, mapping types to likely swapping and free variable 
functions, makes it clear what the instantiations for the following variables should be: 



rswap 

rFV 

pswap 

pFV 



swap 

FV 

Xx y. ( swaps tr x y x swap x y) 
X{v,M). {v}UFV(M) 



Finally, sub ' requires A to be { " f "} ^. 

The instantiation for sub' above creates quite a complicated instance of the final 
recursion principle. Nonetheless, the derived side-conditions are easy to eliminate. 

The definition of functions such as rator, where values for whole classes of argu- 
ment are left unspecified, brings up one last wrinkle. The database storing information 
about each type should record a value in each type that has no free names (if possible: 
the type string has no such value). This value can then be provided as the result 
value for the omitted constructors. If this can’t be done then the X parameter will need 
to be instantiated to cover the extra free names present in whatever value was chosen to 
be the value in the unspecified cases. This is something to avoid if possible, because it 
results in the commutativity result in the conclusion of the recursion principle retaining 
its annoying side-conditions. 



7 Related Work 

There are three pieces of work closely similar to the topic of this paper. All explicitly 
concern themselves with the specification of a recursion (or “iteration”) principle for 
types with binders and a-equivalence, and all three apply the developed theory in a 
mechanised setting. Two are the inspiration for my own work: Gordon and Melham [5], 

^ A general rule for the calculation of X might be to include in X any names mentioned explic- 
itly in a proposed definition. This question probably doesn’t warrant much investigation, as 
functions like sub ' , with their own free names, seem unlikely to arise in practice. 
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and the Gabbay-Pitts theory of Fraenkel-Mostowski sets, particularly §10.3 of Gabbay’s 
PhD thesis [3]. The third is work by Ambler, Crole and Momigliano in [1]. 

Clearly, this work would have been impossible without the underlying Gordon- 
Melham characterisation of X-calculus terms up to a-equivalence. My claim is that, 
complicated side-conditions notwithstanding, the final recursion principle in Figure 1 
is an improvement on the original Gordon-Melham principle (1). This is because the 
new principle has a conclusion that allows functions to be defined in a way that much 
more closely approximates familiar and traditional principles of primitive recursion. 

Inasmuch as the new principle embodies restrictions imported from the theory of 
FM-sets, it can not define all of those functions definable with the original principle. 
For example, neither principle will support the definition of a function with clause 

/ (lam V t)=t 

because this is unsound. But it is not difficult to use (1) to define a function with clause 
/ (lam V t)= t[v ^ new(fv(lam V t))] 

This returns the body of the abstraction with an arbitrary, fresh name substituted through 
for the bound variable. Appealing as it does to the Axiom of Choice, in a way that would 
allow the enumeration of all names, this function is impossible to define in the Fraenkel- 
Mostowski theory, and also impossible to define using Figure 1 ’s recursion principle. 

My work has been greatly inspired by the theory of permutations developed by 
Gabbay and Pitts. It might be characterised as an attempt to bring the nice features 
of this FM-theory into the world of classical higher-order logic. In this “HOL world”, 
one need not give up the Axiom of Choice. Nor need one assert that the set of names 
is infinite, but that its subsets are all either finite or have finite complements. Instead 
those axioms of the FM-theory that are absolutely necessary for function definition in 
the primitive recursive style are imported as side-conditions. These side-conditions are 
easily discharged for definitions that are well-behaved; seemingly the vast majority. For 
those definitions that are not so well behaved, the HOL resident can always resort to (1); 
in the world of FM-sets, these definitions remain inadmissible. The only significant loss 
in the classical setting would appear to be the I/I quantifier, or at least its nice properties, 
such as (l/lx.^P) = ~^{\Ax.P). 

Another possible advantage of the approach described here is that the user is able 
to choose the instantiations for pswap and rswap on a case-by-case basis. If, for 
example, a function used strings in a way unconnected with their role as names, one 
wouldn’t provide swaps tr as the permutation function, but rather the null swapping 
function (Xx y z- z). This freedom may or may not be significant in practice. A re- 
lated idea, though one that also loses this flexibility, might be to use Isabelle/HOL’s 
axiomatic type classes to automatically associate types with appropriate permutation 
and free-name functions, thereby allowing the swapping side conditions in the recur- 
sion principle to disappear. 

Finally, recent work by Ambler, Crole and Momigliano [1] presents a recursion 
principle for a (weak) higher-order abstract syntax view of the untyped X-calculus (in 
classical Isabelle/HOL). This work gets around some of the typical problems associated 
with higher-order abstract syntax by working with terms-in-infinite-contexts, thereby 
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providing a type of terms A where the function space {var A) is isomorphic to the 
original type A. This is achieved by making A itself a function space: {var°° — > Aq), for 
an underlying algebraic type Aq. Ambler et al. then dehne an inductive relation prop 
that isolates the “proper” or non-exotic values of A. In order to retain the use of meta- 
level functions in the object syntax, the proper terms are not used as the basis for the 
definition of a new type. So, while A still includes exotic terms, prop allows them to be 
identihed. 

The recursion combinator is a perfect instance of primitive recursion in its behaviour 
under the abstraction binder, but the extra infinite context parameters add complexity. 
When passing under a binder in the dehnition of substitution, for example, variable 
indices need to be incremented in both the term being substituted and the body being 
substituted into. This is rather reminiscent of the de Bruijn implementation of substitu- 
tion. 

The work by Ambler et al. is the first to prove a recursion principle for function 
definition over (weak) higher-order abstract syntax. Their paper provides pointers to a 
number of other HOAS approaches to the problem. Work by Schiirmann, Despeyroux, 
and Pfenning [8], and by Washburn and Weirich [9], exemplifies one such approach. 
In this work, sophisticated type systems (modal X-calculus, and first-class parametric 
polymorphism respectively) prevent the untrammeled use of function-spaces, thereby 
avoiding exotic terms while allowing iteration over these structures. Meta-theoretic rea- 
soning about such embedded systems (e.g., proving results akin to the standardisation 
theorem for the untyped X-calculus) remains a challenging area for future research. 



8 Conclusion 

I have presented a new recursion principle for the type of X-terms that allows the 
ready dehnition of recursive functions over these terms. It has been proved in HOL, 
and motivates a dehnitional technique that looks as much as possible like primitive 
recursion. The validity of recursions that pass under binders is ensured by appeal to 
side-conditions that embody restrictions based on the ideas of Gabbay’s and Pitts’s 
Fraenkel-Mostowski set theory. 

I have further implemented a small HOL library that allows users to write dehnitions 
in the obvious “pattern-matching” style, and which automatically discharges the FM- 
related side conditions. This is done with the help of a small database mapping types to 
information about how they support permutation and the notion of free names. 

The theorem and the library have been tested on the definitions made in the course 
of an earlier project mechanising a substantial piece of X-calculus theory. A sample of 
representative functions (all of which the theorem handles) is presented in Section 2 
above. 

Versions of the recursion principle for other types with binders are easy to state: in 
the antecedents, they simply require that all the functions standing in for the construc- 
tors of the new type (the equivalents of var, con, app and lam in Figure 1) not generate 
too many fresh names, and that they and permutations commute. In the conclusion of 
these theorems, the equations for constructors that are binders acquire side conditions 
stating that the recursive characterisation is invalid for finitely many choices of bound 
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variable name. Automating the proof of such theorems is the key task in being able to 
define new types with binders automatically. 

Future work. In the short term, I hope to soon extend the implementation of the library 
to support the definition of functions with more than one parameter. This is not concep- 
tually difficult; the work required is simply that of implementing transformations such 
as moving from tupled to curried arguments, and switching parameter orders. 

A recursion principle that supported the definition of well-founded functions would 
also be very useful. This would allow definitions that recursed at arbitrary depths un- 
der binders. HOL’s existing implementation of definition by well-founded recursion 
requires that constructors be injective, something not true of binders. 

A more significant project is the development of theory and code to support the 
establishment of new types with binders. It is not difficult to establish new types by 
hand, and it is also clear what the recursion principle for new types should be. The 
challenge will be establishing types automatically, including the proof of their recursion 
principles. 

Availability. All of the theory and code described in this paper will be available as part 
of the next distribution of the HOT system. 



References 

1. Simon J. Ambler, Roy L. Crole, and Alberto Momigliano. A definitional approach to 
primitive recursion over higher order abstract syntax. In Honsell et al. [6]. Available at 
http://doi.acm.org/10.1145/976571.976572. 

2. H. R Barendregt. The Lambda Calculus: its Syntax and Semantics, volume 103 of Studies in 
Logic and the Foundations of Mathematics. Elsevier, Amsterdam, revised edition, 1984. 

3. M. J. Gabbay. A Theory of Inductive Definitions with Alpha-Equivalence. PhD thesis. Univer- 
sity of Cambridge, 2001. 

4. M. J. Gabbay and A. M. Pitts. A new approach to abstract syntax involving binders. In 14th 
Annual Symposium on Logic in Computer Science, pages 214—224. IEEE Computer Society 
Press, Washington, 1999. 

5. A. D. Gordon and T. Melham. Five axioms of alpha conversion. In J. von Wright, J. Gmndy, 
and J. Harrison, editors. Theorem Proving in Higher Order Logics: 9th International Con- 
ference, TPHOLs’96, volume 1125 of Lecture Notes in Computer Science, pages 173-190. 
Springer- Verlag, 1996. 

6. Furio Honsell, Marino Miculan, and Alberto Momigliano, editors. Merlin 2003, Proceedings 
of the Second ACM SIGPLAN Workshop on Mechanized Reasoning about Languages with 
Variable Binding. ACM Digital Library, 2003. 

7. Michael Norrish. Mechanising Hankin and Barendregt using the Gordon-Melham axioms. In 
Honsell et al. [6]. Available at http://doi.acm.org/10.1145/976571.976577. 

8. Carsten Schiirmann, Joelle Despeyroux, and Frank Pfenning. Primitive recursion for higher- 
order abstract syntax. Theoretical Computer Science, 266(1-2): 1-57, September 2001. 

9. Geoffrey Washburn and Stephanie Weirich. Boxes go bananas: Encoding higher-order abstract 
syntax with parametric polymophism. In ICFP ’03: Proceedings of the Eighth ACM SIGPLAN 
International Conference on Functional Programming, pages 249-262. ACM Press, 2003. 




Abstractions for Fault-Tolerant 
Distributed System Verification 



Lee Pike^, Jeffrey Maddalon^, Paul Miner^, and Alfons Geser^ 

^ Formal Methods Group 
NASA Langley Research Center 
M/S 130, Hampton, VA 23681-2199 
{lee . s .pike , j .m.maddalon,paul . s .miner}@nasa.gov 
^ National Institute of Aerospace 
144 Research Drive, Hampton, VA 23666 
geser@nianet . org 



Abstract. Four kinds of abstraction for the design and analysis of fault- 
tolerant distributed systems are discussed. These abstractions concern 
system messages, faults, fault-masking voting, and communication. The 
abstractions are formalized in higher-order logic, and are intended to 
facilitate specifying and verifying such systems in higher-order theorem- 
provers. 



1 Introduction 

In recent years, we have seen tremendous growth in the development of em- 
bedded computer systems with critical safety requirements [10,12], and there 
is no expectation that this trend will abate. For instance, steer-by-wire systems 
are currently being pursued [11]. To withstand faulty behavior, safety-critical 
systems have traditionally employed analog backup systems in case the digi- 
tal system fails; however, many new “by-wire” systems have no analog backup. 
Instead, they rely on integrated digital fault-tolerance. 

Due to their complexity and safety-critical uses, fault-tolerant embedded sys- 
tems require the greatest assurance of design correctness. One means by which 
a design can be shown correct is formal methods. Formal methods are espe- 
cially warranted if we recall that published and peer-reviewed informal proofs-of- 
correctness of seemingly simple fault-tolerant algorithms have been incorrect [16] . 
Here, we focus on formal methods involving higher-order theorem-provers. 

Although many fault-tolerant distributed systems and algorithms have been 
specified and verified, the abstractions used have often been ad-hoc and system- 
specific. Developing appropriate abstractions is often the most difficult and time- 
consuming part of formal methods [26]. We present these abstractions to sys- 
tematize and facilitate the practice of abstraction. 

The abstractions presented are in the spirit of abstractions of digital hardware 
developed by Thomas Melham [19, 18]. They are intended to make specifications 
and their proofs of correctness less tedious [14], less error-prone, and more uni- 
form. Although the abstractions we describe are quite general, we intend for 
them to be accessible to the working verification engineer. 
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These abstractions are the outcome of the on-going project, “Scalable Proces- 
sor-Independent Design for Electromagnetic Resilience” (SPIDER), at NASA’s 
Langley Research Center and at the National Institute of Aerospace [29]. SPI- 
DER is the basis of an FAA study exploring the use of formal methods, especially 
theorem-proving, in avionics certification. One of the project goals is to specify 
and verify the Reliable Optical Bus (ROBUS), a state-of-the-art fault-tolerant 
communications bus [28, 21]. The abstractions have proved useful in this project, 
and in fact are the basis of a generalized fault-tolerant library of PVS theories 
mentioned in Sect. 8. 

The structure of our paper is as follows. We discuss fault-tolerant distributed 
systems in Sect. 2. Section 3 gives an overview of the four abstractions presented 
in this paper. Sections 4 through 7 explain these abstractions. Each section 
presents an abstraction, and then the abstraction is formalized in higher-order 
logic. We provide some concluding remarks and point toward future work in the 
final section. 

2 Fault— Tolerant Distributed Systems 

Introductory material on the foundations of distributed systems and algorithms 
can be found in Lynch [17]. Some examples of systems that have fault-tolerant 
distributed implementations are databases, operating systems, communication 
busses, file systems, and server groups [3,28,2]. 

A distributed system is modeled as a graph with directed edges. Vertices are 
called processes. Directed edges are called communication channels (or simply 
channels). If channel c points from process p to process p' , then p can send 
messages over do p' , and p' can receive messages over c from p. In this context, 
p is the sending process (or sender) and p' is the receiving process (or receiver). 
Channels may point from a process to itself. In addition to sending and receiving 
messages, processes may perform local computation. 

A fault-tolerant system is one that continues to provide the required func- 
tionality in the presense of faults. One way to implement a fault-tolerant system 
is to use a distributed collection of processes such that a fault that affects one 
process will not adversely affect the whole system’s functionality. This type of 
system is referred to as a fault-tolerant distributed system. 

3 Four Kinds of Abstraction 

We introduce four fundamental abstractions in the domain of fault-tolerant dis- 
tributed systems. Message Abstractions address the correctness of individual 
messages sent and received. Fault Abstractions address the kinds of faults possi- 
ble as well as their effects in the system. Fault-Masking Abstractions address the 
kinds of local computations processes make to mask faults. Finally, Communi- 
cation Abstractions address the kinds of data communicated and the properties 
required for communication to succeed in the presence of faults. 
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Our formal expressions are stated in the language of higher-order functions: 
variables can range over functions, and functions can take other functions as 
arguments. Furthermore, we use uninterpreted functions (i.e., functions with no 
defining body) that act as constants when applied to their arguments. Curried 
functions and lambda abstraction are also used. For a brief overview of higher- 
order logic from a practitioner’s perspective, see, for example, Melham [19] or 
the PVS language reference [9]. A small datatype, fully explained in Sect. 4, is 
also used. 

The abstractions have been formalized in the Prototype Verification System 
(PVS), a popular interactive industrial-strength theorem proving system [22,8]. 
They are available at NASA [24]. 

4 Abstracting Messages 

4.1 Abstraction 

Messages communicated in a distributed system are abstracted according to their 
correctness. We distinguish between benign messages and accepted messages. The 
former are messages that a non-faulty receiving process recognizes as incorrect; 
the latter are messages that a non-faulty receiving process does not recognize as 
incorrect. Note that an accepted message may be incorrect: the receiving process 
just does not detect that the message is incorrect. 

Benign messages abstract various sorts of misbehavior. A message that is 
sufficiently garbled during transmission may be caught by an error-checking 
code [7] and deemed benign. Benign messages also abstract the absence of a 
message: a receiver expecting a message but detecting the absence of one takes 
this to be the ‘reception’ of a benign message. In synchronized systems with 
global communication schedules, they abstract messages sent and received at 
unscheduled times. 

4.2 Formalization 

Let the set MSG be a set of messages of a given type. MSG is the base set of 
elements over which the datatype is defined. The set of all possible datatype 
elements is denoted by ABSTRAGT_MSG[MSG]. 

The datatype has two constructors, acceptedjmsg and benign jnsg . The for- 
mer takes an element m S MSG and creates the datatype element accepted^ 
msg[m\. The constructor also has an associated extractor value such that 

value{acceptedjmsg[m\) = m . 

The other constructor, benign jmsg , is a constant datatype element; it is a con- 
structor with no arguments. All benign messages are abstracted as a single mes- 
sage; thus, the abstracted incorrect message cannot be recovered. Finally, we 
define two recognizers, acceptedjmsg! and benignjmsg! with the following def- 
initions. Let a G ABSTRAGT_MSG[MSG]. 

acceptedjmsg! {a) = 3m. m G MSG A a = acceptedjmsg[m] , 
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and 



benignjmsgl (a) = a = benigri-msg . 



We summarize this datatype in Fig. 1. Let m G MSG. 



Constructors 


Extractors 


Recognizers 


accepted jrnsg [m] 


value 


accepted-ms g? 


benignjnsg 


none 


benigu-msg? 



Fig. 1. Abstract Messages Datatype 



5 Abstracting Faults 

There are two closely related abstractions with respect to faults. The first ab- 
straction, error types, partitions the possible locations of faults. The second 
abstraction, fault types, partitions faults according to the manifestation of the 
errors caused by the faults^. 



5.1 Abstracting Error Types 

Picking the right level of abstraction and the right components to which faults 
should be attributed is a modeling issue that has been handled in many differ- 
ent ways. We think this is a particularly good example of the extent to which 
modeling choices can affect specification and proof efficacy. 

Both processes and channels can suffer faults [17], but reasoning about pro- 
cess and channel faults together is tedious. Fortunately, such reasoning is redun- 
dant - channel faults can be abstracted as process faults. A channel between a 
sending process and a receiving process can be abstracted as being an extension 
either of the sender or of the receiver. For instance, a lossy channel abstracted 
as an extension of the sender is modeled as a process failing to send messages. 

Even if we abstract all faults to ones affecting processes and not channels, we 
are left with the task of abstracting how the functionality of a process - sending, 
receiving, or computing - is degraded. One possibility is to consider a process 
as an indivisible unit so that a fault affecting one of its functions is abstracted 
as affecting its other functions, too. Another possibility is to abstract all faults 
to ones affecting a process’ ability to send and receive messages as in [27, 23] . 
Finally, models implicit in [5,16] abstract process faults as being ones affecting 
only a process’ ability to send messages. So even if a fault affects a process’ 
ability to receive messages or compute, the fault is abstractly propagated to a 
fault affecting the process’ ability to send messages. 

All three models above are conservative, i.e., the abstraction of a fault is at 
least as severe as the fault. This is certainly true of the first model in which the 

^ An error is “that part of the system state which is liable to lead to subsequent 
failure,'’’ while a, fault is “the adjudged or hypothesized cause of an error” [15]. 
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whole process is considered to be degraded by any fault, and it is true for the 
second model, too. Even though it is assumed that a process can always compute 
correctly, its computed values are inconsequential if it can neither receive nor 
send correct values. As for the third model, the same reasoning applies - even if 
a faulty process can receive messages and compute correctly, it cannot send its 
computations to other processes. 

The model we choose is one in which all faults are abstracted to be ones 
degrading send functionality, and in which channels are abstracted as belonging 
to the sending process. There are two principal advantages to this model, both 
of which lead to simpler specifications and proofs. First, the model allows us 
to disregard faults when reasoning about the ability of processes to receive and 
compute messages. Second, whether a message is successfully communicated is 
determined solely by a process’ send functionality; the faultiness of receivers 
need not be considered. 

5.2 Abstracting Fault Types 

Faults result from innumerable occurrences including physical damage, electro- 
magnetic interference, and “slightly-out-of-spec” communication [4]. We collect 
these fault occurrences into fault types according to their effects in the system. 

We adopt the hybrid fault model of Thambidurai and Park [30]. A process 
is called benign, or manifest, if it sends only benign messages, as described in 
Sect. 4. A process is called symmetric if it sends every receiver the same mes- 
sage, but these messages may be incorrect. A process is called asymmetric, or 
Byzantine [13], if it sends different messages to different receivers. All non-faulty 
processes are also said to be good. 

Other fault models exist that provide more or less detail than the hybrid fault 
model above. The least detailed fault model is to assume the worst case scenario, 
that all faults are asymmetric. The fault model developed by Azadmanesh and 
Kieckhafer [1] is an example of a more refined model. All such fault models are 
consistent with the other abstractions in this paper. 

5.3 Formalization 

We begin by formalizing fault types. Let S and R be sets of processes sending and 
receiving messages, respectively, in a round of communication. Let asym, sym, 
ben, and good be constants representing the fault types asymmetric, symmetric, 
benign, and good, respectively. 

As mentioned, we abstract all faults to ones that affect a process’ ability 
to send messages. To model this formally, we construct a function modeling a 
process sending a message to a receiver. The range of the function is the set of 
abstract messages, elements of the datatype defined in Sect. 4. As explained, 
MSG is a set of messages, and ABSTRACT_MSG[M SG] is the set of datatype 
elements parameterized by MSG. Let s G S and r G i? be a sending and receiving 
process, respectively. Let ms g -map be a function from senders to the message 
they intend to send, and let sender status be a function mapping senders to 
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their fault partition. The function outputs the abstract message received by r 
from s. 



send{msgjjiap, sender ^status, s, r) = 

{ accepted jnsg[msgjnap{s)] 
benignjmsg 
symjmsg{msgjmap{s) , s) 
asymjmsg{msgjnap{s), s, r) 



sender _status{s) = good 
sender -status(s) = ben 
sender -status(s) = sym 
sender -status(s) = asym 



If s is good, then r receives an accepted abstract message, defined in Sect. 4, 
from s. If s is benign, then r receives a benign message. In the last two cases 
- in which s suffers a symmetric or asymmetric fault - uninterpreted functions 
are returned. Applied to their arguments, symjmsg and asymjmsg are unin- 
terpreted constants of the abstract message datatype defined in Sect. 4. The 
function asymjmsg models a process suffering an asymmetric fault by taking 
the receiver as an argument: for receivers r and r' , asymjmsg{msgjmap{s), s, r) 
is not necessarily equal to asymjmsg(jnsgjmap{s), s,r'). On the other hand, 
the function symjmsg does not take a receiver as an argument, so all receivers 
receive the same arbitrary abstract message from a particular sender. 



6 Abstracting Fault-Masking 

6.1 Abstraction 

Some of the information a process receives in a distributed system may be incor- 
rect due to the existence of faults as described in Sect. 5. A process must have a 
means to mask incorrect information generated by faulty processes. Two of the 
most well-known are (variants of) a majority vote or a middle-value selection, 
as defined in the following paragraph. These functions are similar enough to 
abstract them as a single fault-masking function. 

A majority vote returns the majority value of some multiset (i.e., a set in 
which repetition of values is allowed), and a default value if no majority exists. 
A middle-value selection takes the middle value of a linearly-ordered multiset if 
the cardinality of the multiset is odd. If the cardinality is an even integer n, then 
the natural choices are to compute one of (1) the value at index [n/2j, (2) the 
value at index [n/2], or (3) the average of the two values from (1) and (2). Of 
course, these options may yield different values; in fact, (3) may yield a value 
not present in the multiset. 

For example, for the multiset {1, 1, 2, 2, 2, 2}, the majority value is 2, and the 
middle- value selection is also 2 for any of the three ways to compute the middle- 
value selection. For any multiset that can be linearly-ordered, if a majority value 
exists, then the majority value is equal to the middle-value selection (for any of 
the three ways to compute it mentioned above). 

The benefit of this abstraction is that we can define a single fault-masking 
function (we call it a fault-masking vote) that can be implemented as either 
a majority vote or a middle-value selection (provided the data over which the 
function is applied is linearly-ordered). 
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This allows us to model what are usually considered to be quite distinct fault- 
tolerant distributed algorithms uniformly. Concretely, this abstraction, coupled 
with the other abstractions described in this paper, allow certain clock syn- 
chronization algorithms (which usually depend on a middle-value selection) and 
algorithms in the spirit of an Oral Messages algorithm [13, 16] (which usually 
depend on a majority vote) to share the same underlying models [20]. 

6.2 Formalization 

The formalization we describe models a majority vote and a middle- value selec- 
tion over a multiset. A small lemma stating their equivalence follows. Definitions 
of standard and minor functions are omitted. 

Based on the NASA Langley Research Center PVS Bags Library [6], a multi- 
set is formalized as a function from values to the natural numbers that determines 
how many times a value appears in the multiset (values not present are mapped 
to 0). Thus, let V be a nonempty finite set of values^, and let ms : V ^ N be a 
multiset. 

To define a majority vote, we define the cardinality of a multiset ms to be 
the summation of value-instances in it: 

|ms| = ms{v) . 

vev 

The function majjet takes a multiset ms and returns the set of majority 
values in it. 

maj-set{ms) = {u | 2 x ms{v) > |ms|} . 

This set is empty if no majority value exists, or it is a singleton set. Thus, we 
define majority to be a function returning the special constant no -majority if 
no majority value exists and the single majority value otherwise. 

N df f nojmajority : majset(ms) = 0 

maiorityims) = < , ■ w 

\^e{maj-set{ms)) : otherwise. 

The function e is the choice operator that takes a set and returns an arbitrary 
value in the set if the set is nonempty. Otherwise, an arbitrary value of the same 
type as the elements in the set is returned [19]. 

Now we formalize a middle- value selection. Let V have the linear order A 
defined on it. The function mid-valset takes a multiset and returns the set of 
values at index [n/2] when the values are ordered from least to greatest (we 
arbitrarily choose this implementation). The set is always a singleton set. 

mid-vaLset{ms) = 

2 X \lower-filter{ms,v)\ > |ms| A 
2 X \upper - filter {ms, v)\ > \ms\ 

^ If V is finite, then mnltisets are finite. Fault-masking votes can only be taken over 
finite multisets. 
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The function lower^filter filters out all of the values of ms that are less than 
or equal to v, and upper _filter filters out the values greater than or equal to v. 
The function lower liter is defined as follows: 



Similarly, 



lower liter {ms tV) = XI. 



upper _ filter {ms, v) = XI. 



( ms{l) : I 

\ 0 : otherwise . 

J ms{i) : V <l 

\ 0 : otherwise . 



The relation mldjvalset{ms) is guaranteed to be a singleton set, so using 
the function e mentioned above, we can define mlddle-value to return the middle 
value of a multiset: 



mlddlejualue{ms) 



df 



e{mldjval_set{ms)) . 



The following theorem results. 

Theorem 1 (Middle Value Is Majority), majority {ms) yf 
nojmajorlty Implies mlddle_value{ms) = majorlty{ms). 



7 Abstracting Communication 

We identify two abstractions with respect to communication. First, we abstract 
the kinds of data communicated. Second, we identify the fundamental conditions 
that must hold for communication to succeed. 



7.1 Abstracting Kinds of Communication 

Some kinds of information can be modelled by real valued, uniformly continu- 
ous functions. Intuitively, a function is uniformly continuous if small changes in 
its argument produce small changes in its result; see e.g., Rosenlicht [25]. For 
example, the values of analog clocks and of thermometers vary with time, and 
the rate of change is bounded. In a distributed system, a process may sample 
such a function, i.e., determine an approximation of the function’s value for a 
given input. We call such functions Inexact functions and the communication of 
their values Inexact communication. We call discrete functions, such as an ar- 
ray sorting algorithm, exact functions and communication involving them exact 
communication. 



7.2 Abstracting Communication Conditions 

Communication in a fault-tolerant distributed system is successful if validity and 
agreement hold. For exact communication, their general forms are: 

Exact Validity. A good receiver’s fault-masking vote is equal to the value of 
the function good processes compute. 
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Exact Agreement-. All good processes have equal fault-masking votes. 

For inexact communication we have similar conditions: 

Inexact Validity: A good receiver’s fault-masking vote is bounded above and 
below by the samples from good processes, up to a small error margin. 
Inexact Agreement: All good processes differ in their fault-masking votes by 
at most a small margin of error. 

A validity property can thus be understood as an agreement between senders and 
receivers, whereas an agreement property is an agreement between the receivers. 
For lack of space, we limit our presentation to guaranteeing validity. Agreement 
is treated similarly, and complete PVS formalizations and proofs for both are 
located at NASA [24]. 

We distinguish between a functional model and a relational model of commu- 
nication. In the former, communication is modeled computationally (e.g., using 
functions like send from Sect. 5). In the latter more abstract model, conditions 
on communication are stated such that if they hold, communication succeeds. 
This section presents a relational model of communication. 

We specifically present conditions that guarantee validity holds after a single 
broadeast communication round in which each process in a set of senders sends 
messages to each process in a set of receivers (a degenerate case is when these are 
singleton sets modeling point-to-point communication between a single sender 
and receiver). A functional model of a specific communication protocol can be 
shown to satisfy these conditions through step-wise refinement. 

First we describe how a single round of exact communication satisfies exact 
validity, provided that the three conditions Majority Good, Exact Function, and 
Function Agreement hold. The three conditions state, respectively, that the ma- 
jority of the values over which a vote is taken come from good senders, that good 
senders compute functions exactly (i.e., there is no approximation in sampling 
an exact function), and that every good sender computes the same function. 

For a single round of inexact communication, we have inexact validity if the 
two conditions Majority Good and Inexact Function hold. Majority Good is the 
same as above. The Inexact Function condition bounds the variance allowed 
between the sample of an inexact function computed by a good process for a 
given input and the actual value of the function for that input. That is, let e\ 
and £u be small positive constants representing the lower and upper variation, 
respectively, allowed between an inexact function / and potential samples of it as 
depicted in Fig. 2. The sample computed by a good process is bounded by / — £i 
and / + £u- We do not present an analog to the Function Agreement condition 
in the inexact case since processes often compute and vote over slightly different 
functions. For example, each process might possess a distinct local sensor that it 
samples. It is assumed, however, that the functions are related, e.g., each sensor 
measures the same aspect of the environment. 

Clock synchronization [17] is an important case of inexact communication. 
Clocks distributed in the system need to be synchronized in order to avoid drift- 
ing too far apart. In this case, sampling a local clock yields an approximation of 
time. 
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Fig. 2. The Inexact Function Condition for Inexact Communication 

7.3 Formalization for Exact Communication 

First we present the model of a round of exact communication. For a single 
round of communication, let S be the set of senders sending in that round. Let 
good.senders C S' be a subset of senders that are good. This set can change 
as processes become faulty and are repaired, so we treat it as a variable rather 
than a constant. For an arbitrary receiver^, let eligible-senders C S be the set of 
senders trusted by the receiver (we assume that receivers trust all good senders) . 
Then the condition Majority Good is defined 



This stipulates that a majority of the senders in eligible-senders are in good- 
senders. 

Next we describe the values sent and received. Let MSG be the range of 
the function computed - these are the messages communicated. The variable 
ideal : S — *■ MSG maps a sender to the exact value of a function to be computed 
by the sender, for a given input. This frees us from representing the particular 
function computed. Similarly, actual : S MSG maps a sender to the value 
that sender actually computes for the same function and input. Good senders 
compute exact functions exactly: 



Function Agreement states that the functions computed by any two good senders 
is the same (i.e., they send the same messages). 



® The receiver can be any receiver, good or faulty. The abstractions described in Sect. 5 
allow us to ignore the fault status of receivers in formal analysis. 



majority -good(good-senders, eligible-senders) 
2 X \good-senders\ > \eligiblesenders\ A 
good-senders C eligible-senders . 



df 
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Before stating the validity result, we must take care of a technical detail 
with respect to forming the multiset of messages over which a receiver takes a 
fault-masking vote. For an arbitrary receiver, let the function makeJbag take as 
arguments a nonempty set eligible^enders and a function mapping senders to 
the message the receiver gets. It returns a multiset of received messages from 
senders in eligible^enders. 

makeJbag {eligible^enders, actual) = 

Aw. I {s I s € eligible^enders A actual{s) = w} | . 

For exact messages, validity is the proposition that for any good sender, 
the exact value of the function it is to compute is the value computed by the 
receiver’s fault-masking vote. This proposition is defined as follows: 

exact_validity{eligible^enders, good_senders, ideal, actual) = 

Vs. s G good-senders 

ideal{s) = majority{makeJ>ag{eligible^enders, actual)) . 

We use majority for the fault-masking vote, but middle-value selection is ac- 
ceptable given Thm. 1. Using majority, the Exact Validity Theorem reads: 

Theorem 2 (Exact Validity). 

majority -good{good-senders, eligible^enders) and 
exact-f unction{good-sender s , ideal, actual)) and 
f unction jagreement{good_senders, ideal) 
implies that 

exact-validity{eligible^enders, good-senders, ideal, actual) . 



7.4 Formalization for Inexact Communication 

Next we model a round of inexact communication. The Majority Good condition 
is formalized the same as for exact communication. To define Inexact Function, 
we now assume that the elements of MSG have at least the structure of an addi- 
tive group linearly ordered by Inexact Function is defined as the conjunction 
of two conditions. Lower Function Error and Upper Function Error. These two 
conditions specify, respectively, the maximal negative and positive error between 
the exact value of an inexact function and a good sender’s approximation of the 
inexact function, for a given input. 

lower -f unction-crr or{good-sender s , ideal, actual) = 

Vs. s G good-senders ideal (s) — e\ ^ actual(s) ; 

upper -f unction-error {good-senders , ideal, actual) = 

Vs. s G good-senders actual(s) ^ ideal (s) + £u ; 
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inexact_funcUon{goodsenders, ideal, actual) = 

lower unction_err or {good.sender s , ideal, actual) A 
upper _function_error (good.senders , ideal, actual) . 

For inexact communication, validity is the proposition that for a fixed re- 
ceiver, the value determined by a fault-masking vote is bounded both above and 
below by the messages received from good senders, modulo error values e\ and 
£u- Note that each sender may be computing a different inexact function, so 
the vote window depends on both the functions computed as well as the errors 
in approximating them. This is illustrated in Fig. 3, where si and S 2 are good 
senders. 



vote window 



ideal(Sj) 




idealCSj) 





e 



U 



Fig. 3. Inexact Validity 



inexact-validity{eligiblesenders, goodsenders, ideal, actual) = 

3si. Si G good-senders A 

ideal(si) — £i A middlejvalue{makeJ)ag{eligible-senders, actual)) A 
3s2. S 2 € good-senders A 

middlejvalue{makeJbag{eligible^enders, actual)) ^ ideal{s 2 ) 3- £u ■ 

The Inexact Validity Theorem then reads: 

Theorem 3 (Inexact Validity). 

majority _good{good_senders, eligible^enders) and 
inexact_function{good_senders, ideal, actual) 
implies that 

inexact-validity{eligible^enders, good_senders, ideal, actual) . 

8 Conclusion 

This paper presents, in the language of higher-order logic, four kinds of abstrac- 
tions for fault-tolerant distributed systems. These abstractions pertain to mes- 
sages, faults, fault-masking, and communication. We believe that they abstract 
a wide- variety of fault-tolerant distributed systems. 

Other useful abstractions have been developed, too. For example, Rushby 
demonstrates how to derive a time-triggered system from the specification of the 
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system as a (synchronous) functional program [27]. This work has been used in 
the specification and verification of the Time-Triggered Architecture [23]. With 
respect to these works, the abstractions we give systematize specification and 
verification at the level of the functional programs. 

Our abstractions have proved their merit in an industrial-scale formal spec- 
ification and verification project. We are sure that similar projects will profit. 
We are developing a distributed fault-tolerance library as part of the SPIDER 
project. It is designed to be a generic library of PVS theories that may be used 
in the specification and verification of a wide variety of fault-tolerant distributed 
systems. The abstractions described in this paper form the backbone of the li- 
brary. 
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Abstract. Inter alia, Lebesgue-style integration plays a major role in 
advanced probability. We formalize a significant part of its theory in 
Higher Order Logic using Isabelle/Isar. This involves concepts of elemen- 
tary measure theory, real-valued random variables as Borel-measurable 
functions, and a stepwise inductive definition of the integral itself. Build- 
ing on previous work about formal verification of probabilistic algo- 
rithms, we exhibit an example application in this domain; another prim- 
itive for randomized functional programming is developed to this end. 



1 Prologue 



Verifying more examples of probabilistic algorithms will inevitably neces- 
sitate more formalization; in particular we already can see that a theory 
of expectation will be required to prove the correctness of probabilistic 
quicksort. If we can continue our policy of formalizing standard theorems 
of mathematics to aid verifications, then this will provide long-term ben- 
efits to many users of the HOL theorem prover. Joe Hurd 

This quote from the Future Work section of Joe Hurd’s PhD thesis “Formal 
Verification of Probabilistic Algorithms” [7, p. 131] served as a starting point for 
the work subsumed in here. Integration translates to expectation in probability 
theory. The concept of a measure lies at the heart of Lebesgue style integration^. 
Because the definition does not employ such concrete entities as intervals, it 
generalizes easily to functions that do not have the real numbers as their domain. 
In particular, the notion of measure is very natural in the field of probability 
theory. 

The so-called gauge or Kurzweil-Henstock integral is a strictly stronger concept. 
That most powerful integral has even been formalized for functions over the reals 
in the HOL theorem prover by Harrison [6]. However, the simplicity that makes 
it so elegant in real analysis (especially over compact intervals) seems to get lost 
in more general cases, as the intuition behind it is very similar to the Riemann 

^ A measure is simply a function mapping sets to real numbers, which satisfies a few 
sanity properties (cf. 2.3). 
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construction, which depends heavily on the structure of the real numbers as 
domain. 

We begin by declaring some preliminary notions, including elementary measure 
theory and monotone convergence. This leads into measurable real-valued func- 
tions, also known as random variables. A sufficient body of functions is shown to 
belong to this class. A lot of the theory in this section has also been formalized 
within the Mizar project [3,4]. The abstract of the second source hints that it 
was also planned as a stepping stone for Lebesgue integration, though further 
results in this line could not be found. 

The central section is about integration proper. We build the integral for in- 
creasingly complex functions and prove essential properties, discovering the con- 
nection with measurability in the end. To my knowledge, no similar theory had 
been developed in a theorem prover up to this point. It enables formalization of 
results that require general concepts of integration, such as average case analysis 
of algorithms. 

Before closing with a short summary and suggestions for future work, we test our 
achievements in an application. The first moment method is applied to fc-SAT. 
Though the setup is simple enough in terms of integration, a new primitive is 
needed to represent the probabilistic programs involved. 

As stated before, the formalization is performed in the theorem prover Isabelle 
[12,11], using the Isar language [14], the HOT logic [10], and the Real/Complex 
theory [5]. 

2 Measurable Functions 

2.1 Sigma Algebras 
theory Sigma- Algebra2 = Main: 

The theory command commences a formal document and enumerates the the- 
ories it depends on. With the Main theory, a standard selection of useful HOT 
theories excluding the real numbers is loaded. Sigma-Algehra2 is built upon 
Sigma- Algebra, a tiny example demonstrating the use of inductive definitions by 
Markus Wenzel. This theory as well as Measure in 2.3 is heavily influenced by 
Joe Hurd’s thesis [7] and has been designed to keep the terminology as consistent 
as possible with that work. 

Sigma algebras are an elementary concept in measure theory. To measure — 
that is to integrate — functions, we first have to measure sets. Unfortunately, 
when dealing with a large universe, it is often not possible to consistently assign a 
measure to every subset. Therefore it is necessary to define the set of measurable 
subsets of the universe. A sigma algebra is such a set that has three very natural 
and desirable properties. 

constdefs 

sigma- algebra:-, 'a set set => bool 
sigma-algebra A = 
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{} e ^ A (V a. a € A — > -a £ A) ^ 

(V a. (V i-.-.nat. a i £ A) — > (IJ i. a i) £ A) 

Mind that the third condition expresses the fact that the union of countably 
many sets in A is again a set in A without explicitly defining the notion of 
countability. 

Sigma algebras can naturally be created as the closure of any set of sets with 
regard to the properties just postulated. Markus Wenzel wrote the following 
inductive definition of the sigma operator. 

consts 

sigma :: 'a set set => 'a set set 

inductive sigma A 
intros 

basie: a £ A ==> a £ sigma A 
empty. {} G sigma A 

complement: a £ sigma A =4> —a£ sigma A 

Union: (/\i::nat. a i £ sigma A) (IJ i. a i) £ sigma A 

There are a few rather obvious facts to prove about sigma algebras, like the 
universe itself being contained in them as well as the empty set, but they have 
to be left out. 



2.2 Monotone Convergence 
theory MonConv = Lim: 

A sensible requirement for an integral operator is that it be “well-behaved” with 
respect to limit functions. To become just a little more precise, it is expected 
that the limit operator may be interchanged with the integral operator under 
conditions that are as weak as possible. To this end, the notion of monotone 
convergence is introduced and later applied in the definition of the integral. 

In fact, we distinguish three types of monotone convergence here: There are 
converging sequences of real numbers, real functions and sets. Monotone conver- 
gence could even be defined more generally for any type in the axiomatic type 
class^ ord of ordered types like this. 

mon-conv uf = (yn. un<u {Sue n)) A isLub UNIV {range u) f 
However, this employs the general concept of a least upper bound. For the special 
types we have in mind, the more specific limit — respective union — operators 
are available, combined with many theorems about their properties. It still seems 
worthwhile to add the type of real- (or rather ordered-) valued functions to the 
ordered types by defining the less-or-equal relation pointwise. 

instance fun :: {type, ord) ord .. 



For the concept of axiomatic type classes, see [9,15] 
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defs 

le-fun-def: f < g = 'ix. fx<gx 

To express the similarity of the different types of convergence, a single overloaded 
operator is used. 

consts 

mon-conv.-. (nat 'a) ^ 'award ^ bool (-f-) 

defs (overloaded) 

real-mon-conv. x]{yw.real) = (\/ n. x n < x (Sue n)) A x > y 

realfun-mon-conv. 

u^{fw.'a => real) = (\/ n. u n < u {Sue n)) A (V ui. (An. u n w) > f ui) 

set-mon-conv: A{{Bw'a set) = {V n. A n < A {Sue n)) A B — {[Jn. A n) 

lemma realfun-mon-conv-ijJ\ (n|/) = (Vrr. (An. u n w)^{{f w)wreal)) 

The long arrow signifies convergence of real sequences as defined in the theory 
SEQ [5]. Monotone convergence for real functions is simply pointwise monotone 
convergence. Quite a few properties of these definitions will be necessary later, 
but none of them are of intrinsic interest or difficulty. 



2.3 Measure Spaces 

theory Measure= Sigma- Algebra2+MonConv+NthRoot: 

Now we are already set for the central concept of measure. The following def- 
initions are translated as faithfully as possible from those in Joe Hurd’s thesis 

[ 7 ]. 

constdefs 

measurablew 'a set set => 'b set set ^ ('a => 'b) set 
measurable F G = {/. 'i g£G. f —‘g G F} 

So a function is called T’-G-measurable if and only if the inverse image of any set 
in G is in F. F and G are usually the sets of measurable sets, the first component 
of a measure space^. 

measurable-setsw {'a set set * {'a set real)) => 'a set set 
measurable- sets = fst 

measure-.-, {'a set set * {'a set ^ real)) => {'a set ^ real) 
measure = snd 

The other component is the measure itself. It is a function that assigns a non- 
negative real number to every measurable set and has the property of being 
countably additive for disjoint sets. 

® In standard mathematical notation, the universe is first in a measure space triple, 
but in our definitions, following Joe Hurd, it is always the whole type universe and 
therefore omitted. 
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positive'.: {'a set set * {'a set real)) => bool 

positive M = measure M {} = 0 A 

(VTl. j4e measurable-sets M — > 0 < measure M A) 

countably- additive:: {'a set set * {'a set ^ real)) => bool 
countably- additive M = (\/ f::{nat 'a set), range f C measurable- sets M 
A (V m n. m 7 ^ n — > f m C\ f n = {}) A {[J i- f i) & measurable-sets M 
— > (An. measure M (f n)) sums measure M (|J *• /*)) 

This last property deserves some comments. The conclusion is usually — also in 
the aforementioned source — phrased as 
measure M (\Ji. fi) = (X) measure M (fn)). 

In our formal setting this is unsatisfactory, because the sum operator^, like any 
HOT function, is total, although a series obviously need not converge. It is de- 
fined using the e operator, and its behavior is unspecified in the diverging case. 
Hence, the above assertion would give no information about the convergence of 
the series. Furthermore, the definition contains redundancy. Assuming that the 
countable union of sets is measurable is unnecessary when the measurable sets 
form a sigma algebra, which is postulated in the final definition®. 

measure- space:: ['a set set * ('a set ^ real)) bool 
measure-space M = sigma-algebra {measurable-sets M) A 
positive M A countably- additive M 

Note that our definition is restricted to finite measure spaces — that is, measure 
M UNIV <00 — since the measure must be a real number for any measurable 
set. In probability, this is naturally the case. 

Two important theorems close this section. Both appear in Hurd’s work as well, 
but are shown anyway, owing to their central role in measure theory. The first one 
is a mighty tool for proving measurability. It states that for a function mapping 
one sigma algebra into another, it is sufficient to be measurable regarding only 
a generator of the target sigma algebra. Formalizing the interesting proof out of 
Bauer’s textbook [1] is relatively straightforward using rule induction. 

theorem assumes sigma-algebra a and / £ measurable a b 
shows measurable-lift: f G measurable a (sigma b) 

The case is different for the second theorem, which Joe Hurd calls the Monotone 
Convergence Theorem, though in mathematical literature this name is often 
reserved for a similar fact about integrals that we will prove in 3.2. It is only five 
lines in the book, but almost 200 in formal text. 

theorem assumes measure-space M and f\n. A n £ measurable- sets M and A) B 
shows measure-mon-conv: (An. measure M (A n)) > measure M B 

^ Which is merely syntactic sugar for the suminf functional from the Series theory 

[5]. 

® Joe Hurd inherited this practice from a very influential probability textbook [16] 
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The claim made here is that the measures of monotonically convergent sets 
approach the measure of their limit. By the way, the necessity for the above- 
mentioned change in the definition of countably additive was detected only in 
the formalization of this proof. 



2.4 Real- Valued Random Variables 
theory RealRandVar = Measure + Rats: 

While most of the above material was modeled after Hurd’s work (but still proved 
independently), the original content basically starts here®. From now on, we will 
specialize in functions that map into the real numbers and are measurable with 
respect to the canonical sigma algebra on the reals, the Borel sigma algebra. 
These functions will be called real-valued random variables. The terminology 
is slightly imprecise, as random variables hint at a probability space, which 
usually requires measure M UNIV = 1. Notwithstanding, as we regard only 
finite measures (cf. 2.3), this condition can easily be achieved by normalization. 
After all, the other standard name, “measurable functions”, is even less precise. 
As mentioned in the introduction, there have been MIZAR realizations of related 
material[3,4]. The main difference is in the use of extended real numbers — the 
reals together with ±oo — in those documents. It is established practice in 
measure theory to allow infinite values, but “(. . .) we felt that the complications 
that this generated (...) more than canceled out the gain in uniformity (. . .), 
and that a simpler theory resulted from sticking to the standard real numbers.” 
[7, p. 32 f|. Hurd also advocates going directly to the hyper-reals, should the 
need for infinite measures arise; I share his views in this respect. 

constdefs 

Borelsets:: real set set (B) 

B = sigma {5. 3u. S’={..ij}} 

rv:: {'a set set * {'a set => real)) ('a => real) set 

TV M = {/. measure-space M A / £ measurable {measurable-sets M) B} 

As explained in the first paragraph, the preceding definitions' determine the 
rest of this section. There are many ways to define the Borel sets. For example, 
taking into account only rationale for u would also have worked out above, but 
we can take the reals to simplify things. The smallest sigma algebra containing 
all the open (or closed) sets is another alternative; the multitude of possibilities 
testifies to the relevance of the concept. 

® There are two main reasons why the above has not been imported like the probability 
space in the application (4.1). Firstly, there are inconveniences caused by different 
conventions in HOL, meaning predicates instead of sets foremost, that make the 
consistent use of such basic definitions impractical. What is more, the import tool 
simply was not available at the time these theories were written. 

The notation {..«} signifies the interval from negative infinity to u included. 



Formalizing Integration Theory 277 



The latter path leads the way to the fact that any continuous function is measur- 
able. Generalization for H" brings another unified way to prove all the measura- 
bility theorems in this theory plus, for instance, measurability of the trigonomet- 
ric and exponential functions. This approach is detailed in another influential 
textbook by Billingsley [2]. It requires some concepts of topologic spaces, which 
made the following elementary course, based on Bauer’s excellent book [1], seem 
more feasible. 

Two more definitions go next. The image measure, law, or distribution — the last 
term being specific to probability — of a measure with respect to a measurable 
function is calculated as the measure of the inverse image of a set. Characteristic 
functions will be frequently needed in the rest of the development. 

distribution-.-. 

{'a set set * {'a set => real)) ^ {'a ^ real) => (real set ^ real) (law) 
f £ rv M => law M f = {measure M) o {vimage f) 

characteristic-function-.-, 'a set => ('a real) (y-) 
xA X = if X £ A then 1 else 0 

Now that random variables are defined, we aim to show that a broad class of 
functions belongs to them. For a constant function this is easy, as there are only 
two possible preimages. Characteristic functions produce four cases already. 

theorem assumes measure-space M and A £ measurable- sets M 
shows char-rv: yT £ rv M 

For more intricate functions, the following application of the measurability lifting 
theorem from 2.3 gives a useful characterization. 

theorem assumes measure-space M 
shows rv-le-iff: (/ £ rv M) = (V a. {w. f w < af £ measurable-sets M) 

As a first application we show that addition and multiplication with constants 
preserve measurability. Quite a few properties of the real numbers are employed 
in the proof. For the general case of addition, we need one more set to be mea- 
surable, namely {w. f w < g w}. This follows from a like statement for <. A 
dense and countable subset of the reals is needed to establish it. Of course, the 
rationale come to mind. They were not available in Isabelle/HOL®, so I built a 
theory with the necessary properties on my own. It is omitted for the sake of 
brevity. 

theorem assumes f-.f £ rv M and g-. g £ rv M 
shows rv-plus-rv-. {Xw. f w -£ g w) £ rv M 

proof — 

from g have ms: measure-space M by {simp add: rv-def) 

{ fix a 

have {til. a < fw -j- g w} = {w. a -\- {g w)*{—l) < f w} 

® At least not as a subset of the reals, to the definition of which a type of positive 
rational numbers contributed [5]. 
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by auto 

also from g have (Aui. a + {g w)*{—l)) £ rv M 
by {rule affine-rv) 

with / have {w. a + {g w)*{—l) < f w} £ measurable- sets M 
by {simp add: rv-le-rv-measurable) 
finally have {w. a<fw-\-gw}£ measurable-sets M . 

} 

with ms show ?thesis 
by {simp add: rv-ge-iff) 

qed 

To show preservation of measurability by multiplication, it is expressed by addi- 
tion and squaring. This requires a few technical lemmata including one stating 
measurability for squares. 

theorem assumes f £ rv M and g £ rv M 
shows rv-times-rv: {Xw. f w * g w) £ rv M 

Measurability for limit functions of monotone convergent series is also surpris- 
ingly straightforward. 

theorem assumes /\n. u n £ rv M and u{f shows mon-conv-rv: f £ rv M 

Before we end this section to start the formalization of the integral proper, 
there is one more concept missing: The positive and negative part of a function. 
Their definition is quite intuitive, and some useful properties have been proven, 
including the fact that they are random variables, provided that their argument 
functions are measurable. 

constdefs 

nonnegative:: {'a => {'b::{ord,zero})) => bool 
nonnegative / = Vs. 0 < f x 

positive-part:: {'a => {'b::{ord,zero})) ('a => 'b) {pp) 

pp f X = if 0<f{x) then f x else 0 

negative-part:: {'a => {'b::{ord, zero, minus})) => {'a => 'b) {np) 
np f X = if 0<f{x) then 0 else —f{x) 

lemma, f-plus-minus: {{f x)::real) = pp f x — np f x 

theorem pp-np-rv-iff : {f::'a => real) £ rv M = {pp f £ rv M A np f £ rv M) 



3 Integration 

theory Integral = RealRandVar-\- SetsumThms: 

3.1 Simple Functions 

A simple function is a finite sum of characteristic functions, each multiplied with 
a nonnegative constant. These functions must be parametrized by measurable 
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sets. Note that to check this condition, a tuple consisting of a set of measur- 
able sets and a measure is required as the integral operator’s second argument, 
whereas the measure only is given in informal notation. Usually the tuple will 
be a measure space, though it is not required so by the definition at this point. 
It is most natural to declare the value of the integral in this elementary case by 
simply replacing the characteristic functions with the measures of their respec- 
tive sets. Uniqueness remains to be shown, for a function may have infinitely 
many decompositions and these might give rise to more than one integral value. 
This is why we construct a simple function integral set for any function and 
measurable sets/measure pair by means of an inductive set definition containing 
but one introduction rule. 

consts 

sfisv. {'a ^ real) => ('o set set * {'a set real)) real set 
inductive sfis f M 
intros 

base: |/ = (At. i£(S::nat set), x i * x(d i) t); 
y i & S . A i G measurable- sets M; nonnegative x; finite S; 

'iieS. 'ijGS. ijbj — > A i r^ A j = {}■ {\JiGS. Ai) = UNIVj 
=> (5]] i€S. X i * measure M {A i)) e sfis f M 

As you can see we require two extra conditions, and they amount to the sets 
being a partition of the universe. We say that a function is in normal form 
if it is represented this way. Normal forms are only needed to show additivity 
and monotonicity of simple function integral sets. These theorems can then be 
used in turn to get rid of the normality condition. More precisely, normal forms 
play a central role in the sfis-present lemma. For two simple functions with 
different underlying partitions it states the existence of a common finer-grained 
partition that can be used to represent the functions uniformly. The proof is 
remarkably lengthy though the idea seems rather simple. The difficulties stem 
from translating informal use of sum notation, which permits for a change in 
index sets, allowing for a pair of indices. . 

lemma assumes measure-space M and a € sfis f M and b £ sfis g M 
shows sfis-present: 3 zl z2 C K . 

f = 1] i€{K::nat set), zl i * x(C' i) t) A g = (Ai. i&K. z2 i * x(C' i) t) 

A a = (5]] i€K. zl i * measure M [C i)) A b = (J]] i£K . z2 i * measure M [C i)) 
A finite K A {^ieK. 'ij£K. i ^ j — ^ C i n C j = {}) 

A (y i £ K . C i £ measurable-sets M) A (|J i£K. C i) = UNIV 
A nonnegative zl A nonnegative z2 

Additivity and monotonicity are now almost obvious, the latter trivially imply- 
ing uniqueness. The integral of characteristic functions as well as the effect of 
multiplication with a constant follow directly from the definition. Together with 
a generalization of the addition theorem to setsums, a less restrictive introduc- 
tion rule emerges, making normal forms obsolete. It is only valid in measure 
spaces though. 
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lemma assumes measure-space M and i £ S . A i £ measurable- sets M 
and nonnegative x and finite S 
shows sfis-intro: i£S. x i * measure M {A i)) 

£ sfis {Xt. 5]] i£{S::nat set), x i * xi^ 0 ^ 

3.2 Nonnegative Functions 

There is one more important fact about sfis, easily the hardest one to see. It 
is about the relationship with monotone convergence and paves the way for 
a sensible definition of nnfis, the nonnegative function integral sets, enabling 
monotonicity and thus uniqueness. A reasonably concise formal proof could for- 
tunately be achieved in spite of the nontrivial ideas involved — compared for 
instance to the intuitive but hard-to-formalize sfis-present. 

lemma assumes uff and /\n. x n £ sfis {u n) M and x]y 
and r £ sfis s M and s < f and measure-space M 
shows sfis-mon-conv-mono: r < y 

Now we are ready for the second step. The integral of a monotone limit of 
functions is the limit of their integrals. Note that this last limit has to exist in 
the first place, since we decided not to use infinite values. Backed by the last 
theorem and the preexisting knowledge about limits, the usual basic properties 
are straightforward, 
consts 

nnfis:: ('a => real) => ('a set set * {'a set ^ real)) => real set 
inductive nnfis f M 
intros 

base: [ttf/; /\n. x n £ sfis (u n) M; xfyj => y £ nnfis f M 

We close this subsection with a classic theorem by Beppo Levi, the monotone 
convergence theorem. In essence, it says that the introduction rule for nnfis holds 
not only for sequences of simple functions, but for any sequence of nonnegative 
integrable functions. It should be mentioned that this theorem cannot be formu- 
lated for the Riemann integral. We prove it by exhibiting a sequence of simple 
functions that converges to the same limit as the original one and then applying 
the introduction rule. By definition, for any /„ in the original sequence, there is 
a sequence (umn)meN of simple functions converging to it. The nth element of 
the new sequence is then defined as the upper closure of the nth elements of the 
first n sequences. 

theorem assumes ffh and /\n. xn £ nnfis {f n) M 
and x] y and measure-space M 
shows nnfis-mon-conv: y £ nnfis h M 

3.3 Integrable Functions 

Before we take the final step of defining integrability and the integral operator, 
we should first clarify what kind of functions we are able to integrate up to now. 
It is easy to see that all nonnegative integrable functions are random variables. 
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lemma assumes measure- space M and a G nnfis f M 
shows nnfis-rv: f £ rv M 

The converse does not hold of course, since there are measurable functions whose 
integral is infinite. Regardless, it is possible to approximate any measurable 
function using simple step-functions. This means that all nonnegative random 
variables are quasi integrable, as the property is sometimes called, and brings 
forth the fundamental insight that a nonnegative function is integrable if and 
only if it is measurable and the integrals of the simple functions that approximate 
it converge monotonically. Technically, the proof is rather complex, involving 
many properties of real numbers. 

lemma assumes measure-space M and ■. f £ rv M and nonnegative f 
shows rv-mon-conv-sfis: 3u x. wf/ A {W n. x n £ sfis {u n) M) 

The following dominated convergence theorem is an easy corollary. It can be 
effectively applied to show integrability. 

corollary assumes measure-space M and f £ rv M 
and b £ nnfis g M and f<g and nonnegative f 
shows nnfis- dom-conv. 3 a. a £ nnfis f M A a < b 

Speaking all the time about integrability, it is time to define it at last, 
constdefs 

integrable’.: ('a => real) => ('a set set * {'a set => real)) bool 

integrable f M = measure-space M A 

{3x. X £ nnfis (pp f) M) A (3 y. y £ nnfis {np f) M) 

integral:: {'a => real) ^ {'a set set * ('a set => real)) => real (f - d-) 
integrable f M => f f dM = (THE i. i £ nnfis {pp f) M) — 

{THE j. j £ nnfis {np f) M) 

A useful lemma follows, which helps lift nonnegative function integral sets to 
integrals proper. The dominated convergence theorem from above is employed 
in the proof. 

lemma nnfis-minus-nnfis-integral: 

assumes a £ nnfis f M and b £ nnfis g M and measure-space M 
shows integrable {Xt. f t — g t) M and f {Xt. ft — gt)dM = a — b 

Armed with this, the standard integral behavior should not be hard to derive. 
Mind that integrability always implies a measure space, just like random vari- 
ables did in 2.4. 

theorem assumes integrable f M 
shows integrable-rv: f £ rv M 

theorem integral- char: assumes measure-space M and A £ measurable- sets M 
shows f xA d M — measure M A and integrable yA M 
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theorem integral- add-, assumes integrable f M and integrable g M 
shows integrable {Xt. f t -\- g t) M 
and / {Xt. f t -i- g t) dM = J f dM + J g dM 

theorem assumes integrable f M and integrable g M and f<g 
shows integral-mono-, f f dM < f g dM 

theorem integral-times: assumes integrable f M 
shows integrable {Xt. a*f t) M and | {Xt. a*f t) dM = a* f / dM 

To try out our definitions in an application, only one more theorem is missing. 

The famous Markov-Chebyshev inequation is not difficult to arrive at using the 

basic integral properties. 

theorem assumes integrable f M and 0<a and integrable (Ax. \f x\ '' n) M 
shows markov-ineg: law M f {a..} < / (Ax. |/x| " n) dM / {a~n) 



4 Probabilistic Algorithms 

To take up a point from the prologue, one major motive for formalizing integra- 
tion is to formalize expectation. Indeed, the expectation of a random variable is 
nothing but its integral. This simple fact makes it possible to use all the theo- 
rems about integration to manipulate expected values. In the application I chose, 
only two properties are needed, namely additivity and the Markov inequation. 
The latter gives rise to the so-called first moment method®. Before going into 
the details of the use case, a concrete probability space is required. 

4.1 The Probability Space 

theory ImportPredSet = HOL4ExtraProb-\-Measure: 

It is at this point that real HOL4 theories from Hurd’s thesis [7] come into play. 
They have been imported to Isabelle/HOL by Sebastian Skalberg using his Im- 
port Tool [13]. Joe Hurd has formalized the probability space of independent 
identically distributed infinite Bernoulli trials, or random bitstreams. It is ap- 
plied in theorems about probabilistic functional programs employing monadic 
notation. These can be built from three primitives only: sdest hands back a 
tuple consisting of the first bit of the argument bitstream and the rest of this 
bitstream. UNIT lifts the first argument value to the monad by just pairing it 
with the unmodified second argument boolean sequence. BIND is the monadic 
equivalent of functional composition. 

The main problem in incorporating Hurd’s theories (in the imported form) is 
that in HOL4, predicates are used instead of sets. Therefore, a little work is 
required to switch between the equivalent but technically different variants of 
probability space definitions for example. 

® This is a standard technique in the field of randomized algorithms; it may be found 
in the authoritative textbook on the subject by Motwani and Raghavan [8] 



4.2 A New Primitive 
theory Lsdest = HOL^ExtraProb: 
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It is time to introduce the example application that is being formalized in section 
4.3. We will be looking at the most simplistic possible program for finding a 
satisfying assignment for a propositional formula in conjunctive normal form 
where any clause consists of exactly k literals. This problem is known as A:-SAT. 
Our algorithm simply selects a random assignment for all of the n variables. 
We are interested in the probability that the assignment fails to satisfy a given 
clause. The reasons behind this will become clear in a while. 

In the previous section it was stated that one should be able to construct any 
randomized functional program from the three primitive building blocks defined 
there. Of course, this also holds for the program we have in mind. Nevertheless, 
when following the style these constructs suggest, taking one random bit at a 
time and evaluating somewhere in between, one runs into problems. That is to 
say, the clauses are not independent in general. A variable may appear in several 
clauses, and it would be wrong to fetch a new bit from the stream every time 
it is evaluated. Ergo, the simplest way to perform the evaluation of a clause 
independently from the rest is to get a list of all n random bits beforehand. A 
function accomplishing this is not hard to devise. It is elementary enough to 
possibly support a lot of programs. 

types 'a seq = nat ^ 'a 

consts 

lsdest nat => 'a seq ('a list * 'a seq) 

primrec 

lsdest 0 = UNIT [] 

lsdest {Sue n) = BIND sdest {Xx. BIND {lsdest n) {XL UNIT (s#0)) 

The decisive theorem about this new function is furnishing all we need to know 
about its results’ probability distribution^^. 

lemma Isdest-probs: Ifinite R\ card R = k;y r£R. Sue r < n] 

=> P (As. \/ r£R. {fst {lsdest n s))\r = b r) = {1/2) "k 

4.3 The First Moment Method 
theory kSAT = Lsdest-\-ImportPredSet-\- Integral-. 

Formulas are modeled as lists of clauses, which in turn are represented by lists 
of integers. The absolute value of a number stands for the variable name, the 
sign signifying negation of the literal. For an illustrative instance,— 4 means the 
4th variable inverted, and 0 is not allowed. A variable may not appear twice 

Here, the exclamation mark operator l!n returns the element number n in the list 
I, while P stands for the imported Bernoulli probability measure from 4.1. Thus, a 
predicate is measured rather than a set. 
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in any clause, as ensured by the absdistinct predicate. Looking at the following 
example might clarify the notation. 

theorem [[1 ^2, 3], [—4, — 2,1]] G 4 var 3 SAT 



A formula can be evaluated at an assignment, that is a list of booleans, by 
primitive recursive functions. 



primrec 

clauseeval [] I = False 

clauseeval (x#xs) I = {if {0<x) then {l\nat {x-\ — 1) V clauseeval xs 1) 

else if (x<0) then {-^{l\nat {-l-\ — x)) V clauseeval xs 1) 
else True) 



primrec 

CNFeval [] I = True 

CNFeval {xffxs) I = {clauseeval x I A CNFeval xs 1) 



Now we may randomize these functions, obtaining just the simple programs 
described in 4.2. In addition, an indicator variable is defined that takes the 
value 1 for exactly those elementary events — or rather bit sequences — where 
the argument clause is not satisfied. 

constdefs 

randCNFeval:: {int list) list => nat bool seq => {bool * {bool seq)) 
randCNFeval F n s = {CNFeval F {stake n s), sdrop n s) 



randclauseevalr. int list => nat bool seq ^ {bool * {bool seq)) 
randclauseeval D n = BIND {Isdest n) {XL UNIT {clauseeval D 1)) 



indicator-.-, int list nat bool seq => real 
indicator D n = y{s. -i fst {randclauseeval D n s)} 



lemma randCNFeval-BIND-UNIT: 
randCNFeval F ns = BIND {Isdest n) {XL UNIT {CNFeval F /)) s 

We just saw that both randclauseeval and randCNFeval can be built from UNIT, 
BIND and sdest alone. Hence they are strongly independent functions^^ In par- 
ticular, indicator is a characteristic function for an event. 

The next step is to compute the measure of this event, the probability that 
a given clause is not satisfied. In spite of the preparatory work on Isdest, the 
greatest difficulty lies in here. Though a rough idea should have emerged until 
now, it is technically demanding to arrive at a setup where Isdest-probs may be 
applied instantly. 

theorem assumes D £ n var k clauses 
shows rce-prob-. P {Xs. fst {randclauseeval D n s)) = {1 1 2) "k 

More about the concept of strong independence may be found in Hurd’s work [7, p. 
70ff]. In this context, it just means that with regard to the first component of the 
function, the preimage of any set is measurable. 
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We should take a moment to appreciate this first result. It embodies the gist of 
the probabilistic analysis for the randclauseeval randomized algorithm. What is 
more, it enables the primal application of integration in the following theorem. 
Here we encounter an expectation in the true sense for the first time in this 
paper. Like any expectation it sums up easily^^. 

theorem sum-ind-int-. assumes F £ n var k SAT shows 
/ ('^■5- X] rn£{.. {length F){}. indicator {F\m) n s) d ImportPredSet.bern 
= real {length F)/2~k 

and integrable (As. m£{.. {length F){}. indicator {F\m) n s) ImportPredSet.bern 

The result just obtained contributes all the information about probabilistic pro- 
grams we will need: The expected number of unsatisfied clauses with our sim- 
plistic algorithm is the total number of clauses divided by 2^. It is only now 
that the first moment method comes into play. The point put forward by this 
proposition is that if the expected value of nonnegative random variable is less 
than 1, then there must be an event witnessing this. The proof turns out to be 
rather elementary from the Markov inequation. 

corollary assumes integrable f M and f f d M < 1 
and ImportPredSet.prob-space M 
shows fmm: 3 s. f s < 1 

In the application we have in mind, a random bitstream that makes the indicator 
variables sum to a value less than 1 corresponds to a satisfying assignment. 

lemma assumes F £ n var k SAT and 

(Yl, m£{.. {length F){}. indicator {F\m) n env) < 1 
shows satisfy. CNFeval F {stake n env) 

In the end we have shown that a satisfying assignment always exists if there are 
less than 2^ clauses in a /c-CNF formula. 

theorem assumes F £ n var k SAT and real{length F) < 2'k 
shows existence-. 3 1. CNFeval F I 



5 Epilogue 

We have formalized a general approach to integration in the Lebesgue style. 
We managed to systematically establish the integral of increasingly complex 
functions. Of course, the repository of potential supplementary facts is vast. 
Convergence theorems, as well as the interrelationship with differentiation or 
concurrent integral concepts, are but a few examples. They leave ample space 
for subsequent work. 

Though the focus has been on the formal content, another aspect of this re- 
search is as an example application of new prover technology. All proofs have 
been carried out in declarative style using Markus Wenzel’s Isar language^^ [14]. 

The bern space is a set version of Hurd’s Bernoulli predicate space from 4.1 

The full theories are available on the web: 

http: / / WWW- Iti . informatik.rwth- aachen.de / ~richter / papers / 
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Unlike Isar, which has been used in several projects, Sebastian Skalberg’s im- 
port tool [13] is still under development. It has proven extremely handy as the 
missing link from Joe Hurd’s HOL4 theories, though differences in terminology 
obviously couldn’t be taken care of automatically. 

To my mind, the example application conveys its point in a satisfactory manner. 
As a side effect, another building block for functional probabilistic programming, 
or what is more, its essential properties, could be obtained. Without a doubt, 
there is an infinite amount of further examples, including more involved varieties 
of the first moment method or the run-time analysis of probabilistic quicksort. 
The latter is work in progress at the time of writing. 
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Abstract. Dynamic class loading is an important feature of the Java 
virtual machine. It is the underlying mechanism that supports installing 
software components at runtime. However, it is also complex. Improperly 
written class loaders could undermine the type safety of the Java virtual 
machine. Given the importance of security, the current description pro- 
vided by the Java virtual machine is deficient. It is ambiguous, imprecise 
and hard to reason about. In this paper, we suggest a model for the Java 
virtual machine, which includes the main features of dynamic class load- 
ing and linking. We formalize the model and prove its soundness in the 
HOL system. The soundness theorem demonstrates that our model can 
preserve types indeed. Based on the model, we can analyze the behaviors 
of loading in the virtual machine. 



1 Introduction 

Dynamic class loading is an important feature of the Java virtual machine 
(JVM). It is the underlying mechanism that supports installing software at run 
time. Although the Java class loading is powerful, it also creates opportuni- 
ties for malicious codes. Early versions (1.0 and 1.1) of the Java Development 
Kit (JDK) contained a serious flaw in class loader implementation. Improperly 
written class loaders could defeat the type safety guarantee of the Java virtual 
machine. For example, Saraswat [17] published a bug related to type spoofing by 
use of dynamic class loaders. With the release of JDK 1.2, an important feature, 
loading constraint scheme, was introduced in JVM specification to fix Saraswat’s 
problem. 

Given the importance of Java security, the current specification [18] of class 
loading is deficient. It is a prose description. Although good by the standards 
of prose, this description is ambiguous, imprecise, and hard to reason about. 
One contribution of our work is that we propose a formal model for the Java 
class loading and linking and prove its soundness. Moreover, we use a theorem 

* Supported by the National Natural Science Foundation of China under Grant 
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prover to increase the reliability and maintainability of the formalization, which 
is another main contribution of our work. 

The paper begins with the overview of class loading and type problems in 
section 2. The operational semantics of the model is specified in section 3. Sound- 
ness theorem and some other lemmas are discussed in section 4. Section 5 relates 
our work to other research. Conclusions are presented in section 6. 

2 The Overview of Class Loading and Typing Problems 

Java is the only system that incorporates all of the following features: lazy load- 
ing, type-safe linkage, user-defined classing loading policy and multiple names- 
paces. The notion of class loader plays a critical role in the security of Java. 
Each class is associated with a specific class loader that corresponds to a specific 
namespace in the virtual machine. JVM uses class loaders to load class files and 
create class objects. Class loaders are ordinary objects that can be defined in 
Java code. They are instances of subclasses of the class java.lang . ClassLoader , 
some methods of which related to the presentation are shown in Figure 1. 

class ClassLoader { 

public Class loadClass(String name); 

protected final Class defineClass(String name, byte[ ] buf, int off, int len); 



} 

Fig. 1. Some methods of java.lang. ClassLoader 

If C is the result of L.loadClassQ, we say that L initiates loading of C 
or, equivalently, that L is an initiating loader of C . If C is the result of 
L.deRneClasssO, we say that L defines C or, equivalently, that L is the defin- 
ing loader of C . Class loading can be delegated. One class loader may delegate 
to another class loader for loading class. Thus, the loader that initiates the load- 
ing is not necessarily the same loader that completes the loading and defines the 
class. 

A run-time class type is determined not by its name alone, but by a pair: its 
fully qualified name and its defining class loader. In this paper, we represent a 
class with the notation <N, Ll>^^ , where N denotes the name of the class, LI 
denotes the defining loader, and L2 denotes the initiating loader. When we do 
not care about the defining loader, the notation is abbreviated to . When 
we do not care about the initiating loader, the notation is abbreviated to <N, 
Ll>. 

Due to class delegation between loaders, a type-spoofing problem was first 
published by Saraswat [17]. Figure 2 presents the problem. The code itself is 
totally correct at compile time. However, type inconsistency will occur at the 
statement r = rr.getR() during run time since r has a type <R, Ll> at runtime, 
but rr.getRO returns a type <R, L2> due to the class loading delegation from 
LI to L2 . Thus, the program above can access the private value in <R, L2> 
through System.out.println(” private value of R in class file R2 = ” + r. secret), which 
violates the type system of Java. 
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class RT { // the defining loader of RT is LI 
private R r; 
void test() { 

RR rr — new RR(); 

r = rr.getRO; // type inconsistency, fail 

System. out. println (’’private value of R in class file R2 = ” + r. secret); 

} 

} 

class RR { // the defining loader of RR is L2 
R getR() { 

return new R() ; 

} 

} 

class R { //the defining loader of R is LI class R {// the defining loader of R is L2 

public int secret; private int secret; 

} } 



Fig. 2. The type spoofing problem 



3 Formal Model 

We propose a state transition system to specify the operational semantics of 
class loading and linking, which is precise enough to describe and analyze the 
loading operations of the virtual machine. States in our model contain data 
structures to specify the inner changes occurring in the virtual machine. These 
data structures are loaded class cache (LCC) and loading class constraints (LLC). 
Thus, our states can be represented as stack x LCC x LLC x heap. There are 
two kinds of state transitions in the model: one is to describe the operations 
of instructions; the other is to specify the process of loading and linking in the 
virtual machine. These transitions are mutually recursive. 

3.1 HOL 4 

We formalize our model and prove its soundness in the HOL 4, which is the 
latest version of the HOL automated proof system for higher order logic. Some 
frequently used HOL functions are: EXISTS ; (’a bool) ’a list^ bool is the 
predicate of list theory. It determines whether there exists an element in a list, 
which satisfies the constraint imposed by the predicate (’a bool) . HD : ’a list 
’a is the standard list processing function to get the first element of a list. 
TL ; ’a list ’a list is the list processing function to get the tail of a list, e.g. 
TL (h::t) = t . EL : ’a list num ’a gets the element of a list which is indexed 
by the second argument of the function. The index of the first element is zero. 
Thus, EL (n — 1) t gets the nth element of a list t. 



3.2 Basic Definitions in HOL 

Because of the limitation of space, we cannot present all definitions of our model. 
Some basic definitions are illustrated in figure 3 and figure 4 respectively. 
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(1) Names of classes, fields and methods 

The definition of CLASSNAME and METHODNAME are straightforward. The 
type of field names is defined abstractly. 



Hol_datatype 



‘CLASSNAME = Class | 



ClassLoader | 
CLSNM of string‘; 



Hol_datatype 



‘METHODNAME = loadClass | 

defineClass | 
MTHDNM of string‘; 



Hol_datatype 

‘INSTRUCTION = areturn | 

getfleld of CLASSNAME # EIELD | 
getstatic of CLASSNAME # FIELD | 
invokestatic of CLASSNAME # METHOD | 
invokevirtual of CLASSNAME # METHOD | 
putfield of CLASSNAME # FIELD | 
putstatic of CLASSNAME # FIELD | 
new of CLASSNAME ‘; 



Hol_datatype Hol_datatype 

‘CLASS = <|Cls_Loader:LOADER; ‘FIELD = <|Fld_Name:FIELDNAME; 

Cls_Name:CLASSNAME; Fld_Type:CLASSNAME|>‘; 

Cls_SuperName:CLASSNAME; 

Cls_fld:FIELD list; 

Cls_mthd:METHOD list|>‘; 



Hol_datatype Hol_datatype 

‘METHOD= <|Mthd_Name:METHODNAME; ‘LCC = <|LCC_loader:LOADER; 

Mthd_arg:CLASSNAME list; LCC_classname:CLASSNAME; 

Mthd_retType:CLASSNAME|>‘; LCC_Class:CLASS|>‘; 

Hol_datatype Hol_datatype 

‘LLC = <|LLC_Ll:LOADER; ‘STACK = <|STK_cls:CLASS; 

LLC_L2:LOADER; STK_mthd:METHOD; 

LLC_classname:CLASSNAME|>‘; STK_lvar:VALUE list list; 

STK_pc:num; 

STK_os:VALUE list list|>‘; 

Hol_datatype 

‘PROG_STATE = <|PROG_STATE^tack:STACK list; 

PROG_STATE_heap: VALUE list; 

PROG_STATE_lcc:LCG list; 

PROG_STATE_llc:LLC list |>‘; 

Fig. 3. Some basic data structures defined in HOL 



(2) Instructions 

According to the JVM specification, the instruction, getfield, getstatic, putfield 
or putstatic, has two operands generated at compile time. These operands are 
used to construct an index into the runtime constant pool of the current class. 
The runtime constant pool item at that index must be a symbolic reference to 
a field, which gives the name and descriptor of the held as well as a symbolic 
reference to the class in which the field is to be found. In our model, the name and 
descriptor of the held correspond the second argument FIELD of the instruction; 
the symbolic reference to the class corresponds the first argument CLASSNAME . 
The argument CLASSNAME#METHOD of invokestatic and invokevirtual can be 
understood similarly. 
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(3) Class, Objects and Heap 

java. lang. Class plays an essential role in the JVM architecture. This class imple- 
ments the reflection mechanism of Java. An object instance of java. lang. Class 
keeps the meta information of each class of Java. In our model, Class is defined 
as a record type that consists of a defining loader, a class name, a direct super 
class name, as well as held and method declarations. 

In our model, the type of object is defined abstractly with a type VALUE. 
Thus the type of the heap is VALUE list . To talk about the internal structures 
(such as held and method) of an object, we define an abstract function objcls : 
VALUE CLASS, which models the reflection of Java to return the meta infor- 
mation of an object. Thus we can build a connection between an abstract object 
with its corresponding Class object. 

(4) Loaded class cache (LCC) list 

The virtual machine maintains two kinds of consistency. One is temporal names- 
pace consistency; the other is the consistency between the loaded class cache and 
loading class constraints. Temporal namespace consistency means that the vir- 
tual machine must be able to obtain the same class type for a given class name 
and loader at any time. However, mistakes in the user-defined loadclass method 
or malicious code may violate the constraint. Therefore, the virtual machine 
must check the consistency for every loaded class. This is implemented by main- 
taining an inner data structure, LCC list, in the virtual machine. The type of 
LCC is defined as a record type which models a map from an initiating loader 
and a class name to the corresponding Class object. 

(5) Loading class constraints (LLC) list 

In Java 2, a loading class constraint scheme is introduced to fix the type-spoofing 
problem in the virtual machine. It has advantages of ensuring type-safe linkage 
and preserving lazy class loading. A LLC record <L1, L2, N> represents a load- 
ing class constraint imposed on the virtual machine, which means 

(6) Program state 

From the definition, a program state is composed of a stack list, a heap, a LCC 
list and a, LLC list. Where the stack consists of the current class, the current 
method, the program counter, the local variable list and the operand stack list. 

The tricky point here is that the types of the local variable and the operand 
stack are VALUE list list. These structures are just to define the model more 
conveniently. For example, some bytecode instructions require more than one 
operand to process, thus the top element (a list) of the operand stack (a list 
list) in the initial state can contain all operands to be processed by the current 
instruction. Therefore, the operand stack is structured by each of its elements 
for bytecode instructions. In our formalization, we just model the operand stack 
and the local variable to be object containers. However, we do not try to impose 
any constraints on the implementation. 




292 



Tian-jun Zuo, Jun-gang Han, and Ping Chen 



(subtyping) 

sub c c’ Icc — EXISTS (eq_subc c) Icc A EXISTS (eq_subc c’) Icc 

D (EXISTS (eq_sublcc c.Cls_Loader c.Cls_SuperName c’) Icc); 

(method resolution) 

(mthd_RESOLUTION cIs mthd [] ^ els) A 
(mthd_RESOLUTION cIs mthd (x::t) = 
if (EXISTS (eq mthd) cIs.Cls_mthd) 
then cIs 

else if (cls.Cls_SuperName = x.LCC_Class.Cls_Name) A 
(cls.Cls_Loader = x.LCC_loader) 
then mthd_RESOLUTION x.LCC_Class mthd t 
else mthd_RESOLUTION els mthd t); 



(LCC list and LLC list consistency) 
wf_constraint Icc 11c = 

VI r n c c’. 

— ' ((EXISTS (wf_eqlcc Inc) Icc) A 
(EXISTS (wf_eqlcc 1’ n c’) Icc) A 
(EXISTS (wf_eqllc 1 V n) 11c) A 
^ (c = c’)); 

Fig. 4. Some basic functions and predicates defined in HOL 

(7) subtyping 

Predicate sub : CLASS ^ CLASS ^ LCC list bool defines the subtyping relation 
between classes. It asserts that if C is the subclass of C in the context of LCC 
list, then C' — C .Cls-SuperName‘^''^‘‘’-^°°'‘^‘''^ . Where C.Cls_Loader is the defining 
loader of C . eqsubc and eqsublcc are predicates applied to predicate EXISTS . 
The transitive closure of subtyping relation is defined as relation subtc : CLASS 
CLASS LCC list bool. 

(8) LCC list and LLC list consistency 

To preserve the type safety, the consistency between the loaded class cache and 
the loading class constraints is maintained by the virtual machine. Every time 
there is a modification of either of them, the virtual machine will consider the 
both to guarantee the consistency. 

Predicate wCconstraint : LCC list LLC list bool defines the consistency 
between the LCC list and the LLC list. It asserts that the consistency can be 
satisfied iff the following conditions cannot hold at the same time: 

~ There exists a loader L such that L has been recorded by the virtual machine 
as an initiating loader of a class C denoted by N . 

~ There exists a loader L’ such that L’ has been recorded by the virtual 
machine as an initiating loader of a class C’ denoted by N . 

— The equivalence relation defined by the (transitive closure of the) set of 

imposed constraints implies . 

- CjlC’. 

(9) Method and field resolution algorithm 

Since the LCC list keeps the linearization of the class hierarchy loaded in the 
virtual machine, algorithm mthd_RESOLUTION : CLASS METHOD LCC list 
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^ CLASS traverses the LCC list recursively to resolve the specified method. The 
details of method resolution will be discussed in the next section. The algorithm 
of field resolution M_RES0LUT10N : CLASS FIELD LCC list CLASS is 
similar to the mthd_RES0LUT10N . 



3.3 Operational Semantics 

We have formalized state transitions of instructions presented in section 3.1, 
as well as class loading and resolution of classes, fields and methods. Because 
of space limitation, we cannot show all the transitions. We only present such 
representative transitions as invokevirtual, class loading, and method resolution 
to illustrate the outline of the class loading and linking of the virtual machine. 
However, we have to refer to some other transitions because these relations are 
mutually recursive. 

In our formalization, these state transitions are defined by pre-defined ML 
function Hoi— rein. Since these HOL definitions are too lengthy and hard to read, 
we take a usual mathematical way to present these transitions which are rep- 
resented as a conjunction of hypotheses implying a conclusion. In the following 
transitions, all the terms up the line are the hypotheses and the term below the 
line is the conclusion. In these definitions, all functions defined in HOL respect 
their type declaration. To be more readable, all the labels in record types are 
omitted. 

(1) invokevirtual 

The bytecode instruction invokevirtual invokes instance method. Its semantics 
is shown in figure 5. We first examine the conclusion. — > represents the state 
transition relation of bytecode instructions. The initial state asserts that: the 
current class is Curds; the current method is Curmthd; and the arguments of 



PC(Curmthd,pc) = invokevirtual (refcn,refmthd) 

E’.ENV_CLS ^ Curds 
E’.ENV_CN ^ refcn 
E’.ENV_MTHD ^ refmthd 

(E’,<|stack; hp;lcc; llc|>) MR (E’,<|stack; hp’; Icc’; llc’|>) 
mthd_RESOLUTION (objcls objref) refmthd Icc’ — c’ 

subtc (objcls objref) (querylcc Icc’ Curds. Cls_Loader refcn). LCC_Class Icc’ 
cond 



<|<|Curds; Curmthd; Ivar; pc;[argn, ..., arg2, argl, objref] :: os|> :: stack; hp; Icc; llc|> — 
<|<|c’; c’.Cls_mthd; [objref, argl, arg2, ..., argn] :: Ivar’; 0; [[]]|> o 
<|Curds; Curmthd; Ivar; pc+1; os|> :: stack; hp’; Icc’; llc”|> 

where 

cond = —1 (subtc c’ (querylcc Icc’ Curds. Cls_Loader refcn). LCC_Class Icc’) D (11c” = He’)) V 
(subtc c’ (querylcc Icc’ Curds. Cls_Loader refcn). LCC_Class Icc’ D 
(c’.Cls_mthd override refmthd A wf_constraint Icc’ 11c” A 11c” = t :: 11c’)) 
t = < |c’.Cls_Loader; Curds. Cls_Loader; refmthd. Mthd^rg| > :: 

<|c’.Cls_Loader; Curds. Cls_Loader; refmthd. Mthd_retType| > 



Fig. 5. The invokevirtual instruction 
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the invoked method are pushed on the top of the operand stack. The subsequent 
state asserts that a new active record <|c’; c’.Cls_mthd; [objref, argl, arg2, argn] 
:: War’; 0; [[]] |> is created after the state transition. Where c’ is the class in which 
the invoked method is selected; objref is the object reference {this object in the 
Java programming language). 

The first line of hypotheses asserts the current instruction of the method 
Curmthd is invokevirtual] where function PC : METHOD num —> INSTRUC- 
TION is to return the current instruction pointed by the program counter. Ac- 
cording to the specification, the invoke of instruction invokevirtual is composed 
of the following processes: 

— Method resolution. The named method is resolved by term (E’,<\stack; hp; 
Icc; llc\>) MR (E’,<\stack; hp’; Icc’; Uc’\>) . Where MR is the transition relation 
of method resolution. E’ defines the context of method resolution. 

— Method selection. This is a process to determine whether there exists method 
overridden in class declaration. Therefore, let C be the class type of objref, 
the actual method to be invoked is selected by term mthd_RESOLUTION 
(objcls objref) refrathd Icc’ = c’, which can be summarized as follows: 

• If C contains a declaration for an instance method with the same name 
and descriptor as the resolved method, and the resolved method is ac- 
cessible from C, then this is the method to be invoked, and the lookup 
procedure terminates. 

• Otherwise, if C has a superclass, this same lookup procedure is performed 
recursively using the direct superclass of (7; the method to be invoked is 
the result of the recursive invocation of this lookup procedure. 

— Invoke the selected method. 

In the relation, term subtc (objcls objref) (querylcc Icc’ Curds. CIs_Loader 
refcn).LCC-CIass Icc’ is a type-safety condition. Let’s examine three arguments 
of the predicate subtc. The first argument {objcls objref) computes the Class ob- 
ject Cof the receiver objref. The second argument (querylcc Icc’ Curds. CIs_Loader 
refcn).LCC_CIass computes the Class object B denoted by refcn. The third argu- 
ment lee' represents the LCC list that has been updated by the method resolution 
MR . Intuitively, this term requires that Class object C of the receiver should 
be the subclass of B. To understand the motivation of the condition, let’s give 
an example: if there is a statement obj.test() in a Java program, then the type 
system requires that the run-time type of obj should be the subclass of the 
type of obj declared in the program. Since the class loading introduces run-time 
namespaces, we use subtc to express such requirement. 

According to the JVM specification, if < C, L> overrides a method To method 
(Ti, T2, ... ,Tn) declared in <B, L’ >, then a set of constraints Tq^ = T^' , . . . , T(j 
= T„ should be added to the LLC list. Term cond is to ensure this requirement. 
Predicate override : METHOD METHOD bool determines whether one 
method can override another by their method descriptors. wWconstraint Icc’ lie” 
checks the consistency between the LCC list and the LLC list, t is the constraint 
added to LLC list. 

querylcc : Loader CLASSNAME LCC returns a record with a specified 
loader and a class name in the LCC list. 
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(2) Class loading 

There are two types of loaders: user-defined loaders and the bootstrap loader 
supplied by the virtual machine. When the virtual machine starts up, the boot- 
strap loader is first used to load classes. Although both types of loaders are 
formalized in our model, the relation LOAD in figure 6 only defines the class 
loading by user-defined loaders. Relation LOAD contains multiple loaders. 

The class loading relation LOAD specifies the process of class loading as 
follows: 



L = E.ENV_CLS.Cls_Loader 
N = E.ENV_CN 
^ (EXISTS (eqln L N) Icc) 
condl A cond2 A cond3 A cond4 



(E,<|stack; hp; Icc; llc|>) LOAD (E,<|stack; hp; <|L; N; C|>::lcc”; llc|>) 
where 

condl = PC(prev_m’,pc’) = invokevirtual (refcn’,m’) A 
m’.Mthd_Name — defineClass A 
prev_m’.Mthd_Name = loadClass A 
- (EXISTS (eqln L’ N) Icc) A 

(<|C’; prev_m’; prev_lv’; prev_pc’; [N, L’] :: prev_os’ |> :: stack; hp; Icc; llc|> 

— »<|<|Class; m’; [L’, N] :: Iv; 0; [[]]|> :: 

<|C’; prev_m’; prev_lv’; prev_pc’+l; prev_os’|> :: stack; hp; Icc; llc|>) 
cond2 = PC(m’,pc) = areturn A 
C.Cls_Loader = L’ A 
(<|<|Class; m’; Iv’; pc; [C] :: os|> :: 

<|C’; prev_m’; prev_lv’; prev_pc’ + l; prev_os’ |> :: stack; hp; Icc’; llc|> — 

<|<|C’; prev_m’; prev_lv’; prev_pc’+l; [C] :: prev_os’|> :: stack; hp; Icc”; llc|>) A 
wf_constraint Icc” 11c 
cond3 = PC(m”,pc”) = areturn A 

m” .Mthd_Name — loadClass A 
prev_C” - E.ENV_CLS A 
(<|<|C”; m”; Iv”; pc”; [C] :: os” | > :: 

<|prev_C”; prev_m”; prev_lv”; prev_pc”; prev_os” |> :: stack; hp; Icc’; llc|> — 
<|<|prev_C”; prev_m”; prev_lv”; prev_pc”; [C] :: prev_os”|> :: 
stack; hp; <|L; N; C|> :: Icc”; llc|>) A 
wf_constraint (<|L; N; C|> :: Icc”) 11c 
cond4 = (E’.ENV_CLS = C) 

(E’.ENV_CN ^ C.Cls_SuperName) 

(E’,<|stack; hp; Icc; llc|>) CR (E’,<| stack; hp; Icc’; llc|>)) 

Icc” = <|L’; N; C|> :: Icc’ 



Fig. 6. The class loading relation 



— Term ^ (EXISTS (eqlcn L N) Icc) in the relation is used to maintain the tem- 
poral namespace consistency. It requires L not been recorded as an initiating 
loader of a class denoted by N in the virtual machine. In other words, it 
requires N not been loaded by L . Otherwise, no class loading is necessary. 

— Intuitively, when there are class loading delegations in a program, there exists 
a loadClass method invocation nest which can be written as loadClass(){. . 
loadClass() {. . .; deBneClass()}}. According to the JVM specification, how- 
ever, the soundness of the delegation model is only determined by the last 
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loadClass invocation because the last loadClass invokes the defineClass which 
defines Class objects. Therefore, we can only formalize the first loadClass and 
the last loadClass to prove the soundness of the class loading delegation. The 
delegation process is specified by condl, cond2 and cond3. 

• condl specifies the invocation of the defineClass by the last loadClass. It 
is a conjunction of five hypotheses. Informally, hypotheses 1 to 3 assert 
the current method is loadClass, the current instruction is invokevirtual 
and it invokes the defineClass . Hypothesis 4 ensures that loader L’ has 
not been recorded as an initiating loader of a class denoted by N. Hy- 
pothesis 5 specifies the state transition of the invocation. 

• cond2 specifies the invocation return of the defineClass. It is a con- 
junction of four hypotheses. Informally, the first hypothesis asserts the 
current instruction is areturn. Hypotheses 2 and 3 assert a Class object 
C is created on the top of the operand stack and the defining loader 
of C is L’. Meanwhile, a record <L’; N; Ois added to the loaded class 
cache. The fourth hypothesis checks the consistency between the LCC 
list and the LLC list. 

• condS specifies the invocation return of the first loadClass. It is a con- 
junction of five hypotheses. Informally, hypotheses 1 and 2 assert the 
active record is loadClass and the current instruction is areturn. The 
fourth hypothesis asserts a Class object C is returned to the last active 
record and a record <L; N; C> is added to the loaded class cache. The 
fifth hypothesis checks the consistency between the LCC list and the 
LLC list. 

— In the process of the invocation of defineClass, the virtual machine deter- 
mines whether C has a direct superclass. If it has, the symbolic reference 
from C to its direct superclass is resolved using class resolution relation CR. 
condA defines such process. It is a conjunction of three hypotheses. Hypothe- 
ses I and 2 define the context of the resolution. Hypothesis 3 asserts a state 
transition of the class resolution. 

— The conclusion of relation LOAD asserts the state transition of class loading. 
(3) Method resolution 

Resolution is the process of dynamically determining concrete values from sym- 
bolic references in the runtime constant pool. Certain Java virtual machine in- 
structions require specific linking checks when resolving symbolic references. For 
instance, in order for an invokevirtual instruction to successfully resolve the sym- 
bolic reference to the method on which it operates it must complete the method 
resolution with relation MR . Method resolution is defined in figure 7. 

Intuitively, if a class refers a symbolic method refmthd in refcn, then the 
virtual machine resolves the symbolic references by method resolution. In the 
relation MR, E is the context of method resolution. C is the current class mem- 
ber of the context E. refcn and refmthd are symbolic references to be resolved. 
The resolution process is as follows: 
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C = E.ENV_CLS 
refcn = E.ENV_CN 
refmthd = E.ENV_MTHD 
E’.ENV_CLS = E.ENV_CLS 
E’.ENV_CN = E.ENV_CN 

(E’,<|stack; hp; Icc; llc|>) CR (E’,<| stack; hp’; Icc’; llc’|>) 
mthd_RESOLUTION els refmthd Icc’ = C’ 
wf_constraint Icc’ 11c” 



(E,<|stack; hp; Icc; llc|>) MR (E,<| stack; hp’; Icc’; 11c” |>) 
where 

els — (querylcc Icc’ C.Cls_Loader refcn). LCC_Class 

11c” — <|C.Cls_Loader; C’.Cls_Loader; refmthd. Mthd_arg| > :: 

<|C.Cls_Loader; C’.Cls_Loader; refmthd. Mthd_retType| > :: 11c’ 

Fig. 7. The method resolution relation 



— If refcn is unresolved, then term (E’,<\stack; hp; Icc; lk\>) CR ('F’,<| stack; 
hp’; Icc’; Uc’\>) resolves the reference first. Terms E’.ENV-CLS = E.ENV-CLS 
and E’.ENV-CN = E.ENV-CN defines the context of class resolution. 

— Let the Class object of refcn is els, which is computed by term (querylcc Icc’ 
C.Cls_Loader refcn). LCC-Class; then the virtual machine invokes algorithm 
mthd_RESOLUTION to look up the method refmthd in the class els and its 
superclasses as following processes: 

• If els declares the method, method lookup succeeds. 

• Otherwise, the lookup is recursively invoked on the direct superclass of 
class els. 

— If method lookup succeeds and the referenced method has a form of To 

method (Ti, T2, ... ,Tn) , a set of constraints, To^ = Tq , . . . , , is 

added to the LLC list. Where L = C.Cls_Loder and L’ = C’.CIs_Loader . Term 
lie” ensures the constraints. 

— Predicate wf -constraint checks the consistency between the LCC list and 
LLC list. 

4 Soundness 

We have proven the soundness of the model. The main soundness theorem states 
that a well-typed state can still preserve well-typedness after being rewritten with 
the relations. For this what a well-typed state means should be defined first. 

4.1 Bytecode Verification 

One important feature of the language design of Java is the bytecode verification. 
It guarantees the runtime well-typedness through the static checks, which allows 
the minimum type checks at run time. 

Bytecode verification has a four-pass architecture. Among these passes, the 
most complicated is the third one which performs dataflow analysis on each 
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method. It is difficult to model this accurately. To make simplifications, we 
define a function verify : CLASS VALUE list CLASSNAME list for bytecode 
verification, which returns the related class name of the object in the context of 
current class. 

4.2 Well- Typed State 

Definition 1 (Well-formed Heap) A heap is well-formed iff any object refer- 
ence objref in the heap satisfies all the following conditions: 

(1) The Class object of objref is in the LCC list. 

(2) Let fid be a field of objref, then the value of fid is either null or an object 
reference that satisfies the condition (1) and (2) recursively. 

(3) Let the class of objref be C , fid he a, field of objref, the class name of fid be 
fntype, the value of fid be v, the class of w be V and the class returned by re- 
solving fid in the context of (objcls objref) with algorithm Bd_RESOLUTION be 
C’ , then there exists a transitive closure of subtyping relation between V and 
the class which takes defining loader of C’ as its initiating loader and fntype as 
its class name. 

Predicate wtLhp : VALUE list LCC list bool defines the condition (1) 
and (2). Predicate wlLgetvaLJip : VALUE list LCC list bool defines condition 
(3). Table look-up function querycls : LCC list CLASS LCC is to return a 
record with a specified class in the LCC list. Predicate fnobj : FIELDNAME 
CLASSNAME VALUE bool is to decide whether there exists a field in an 
object reference. get_objval : VALUE FIELDNAME VALUE is to return the 
value of a field in an object. 

Definition 2 (Well-formed LCC List) A LCC list is well-formed iff let C be 
a loaded class in the LCC list, L be the defining loader of C and N be the class 
name of C , then there exists a record in the LCC list with L as its initiating 
loader, N as its class name and C as its class type. 

Intuitively, definition 2 ensures that the virtual machine have recorded the 
defining loader of each loaded class as its initiating loader. 

Definition 3 (Well-formed LLC List) A LLC list is well-formed iff it is 
consistent with a LCC list. 

Definition 4 (Well-formed Stack Frame List) A stack frame list is well- 
formed iff all the stack frames in the list satisfy the following conditions: 

(1) Let L be the defining loader of the current class, objref be any object ref- 
erence in the current operand stack list, C be the class of objref and N be 
the class name computed by the application verify objref, then there exists a 
transitive closure of subtyping relation between class C and class C’ that takes 
L as its initiating loader and N as its class name. 

(2) Let L be the defining loader of the current class, v be any object reference 
in the current local variable list, C be the class of v and N be the class name 
computed by the application verify v, then there exists a transitive closure of 
subtyping relation between class C and class C’ that takes L as its initiating 
loader and N as its class name. 
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(well-formed heap) 

wfm_hp hp Icc — wf_hp hp Icc A wf_getval_hp hp Icc; 
wf_hp hp Icc — 

Vobj fn fntype. 

(EXISTS (eq_wfsetvalhp obj) hp) D ((— i (querycls Icc (objcls obj) — CNULL)) A 
((fnobj fn fntype obj) 

D((get_objval obj fn — CNULL) VEXISTS (eq_wfsetvalhp (get_objval obj fn)) hp))); 
wf_getval_hp hp Icc — 

Vobj fn fntype c. 

((EXISTS (eq_wfsetvalhp obj) hp) A 
(fnobj fn fntype obj) A 

(c — fld_RESOLUTION (objcls obj) <|Fld_Name:— fn; Fld_Type:— fntype|> Icc)) 
D(subtc (objcls (get_objval obj fn)) (querylcc Icc c.Cls_Loader fntype). LCC_Class Icc); 

(well-formed LCC list) 
wf_lcc Icc — 

Vc. (EXISTS (eqwflcc c) Icc) D (EXISTS (wf_eqlcc c.Cls_Loader c.Cls_Name c) Icc); 

(well-fromed LLC list) 

wf_llc 11c =Vlcc. wf_constraint Icc 11c; 

(well-formed stack frame list) 
wf_state stk — 

(wt (objclslst (HD stk.STK_os)) (verify stk.STK_cls (HD stk.STK_os)) 
stk.STK_cls.Cls_Loader) A 

(wt (objclslst (HD stk.STK_lvar)) (verify stk.STK_cls (HD stk.STK_lvar)) 
stk.STK_cls.Cls_Loader); 

(well-typed state) 
wt_PROG_state st — 

( wfm_}ip st.PROG_STATE_heap st.PROG_STATE_lcc) A 
(wf_Jcc st.PROG_STATE_lcc) A 
(wf_llc st.PROG_STATE_llc) A 
(wf_state (HD st.PROG_STATE_stack)); 

Fig. 8. The definition of well-typed state 



Predicate wt : CLASS list CLASSNAME list LOADER bool determines 
the transitive closure of subtyping relation stated in condition (1) and (2). 
Definition 5 (Well-typed State) A state is well-typed iff its heap, LCC list, 
LLC list and stack frame list are well-formed. 

Figure 8 presents all the definitions in HOL. 



4.3 Soundness Theorem 

We first discuss some non-trivial lemmas related to the proof of the main theo- 
rems in figure 9. 

During method invocation, the virtual machine requires that if <C, L> over- 
rides a method To metbodname (Ti, T 2 , . . . , T„) declared in <C’, L’ > , then a set 
of constraints Tq = , . . . , should be added to the loading class con- 

straints list. Intuitively, lemma 1 implies that the invokevirtual rule can impose 
such constraints correctly. 

The hypotheses of lemma 1 are a conjunction of three formulas. Formula 1 
asserts the invokevirtual rule holds for states stl and st2. Formula 2 asserts the 
current instruction is invokevirtual. Formula 3 asserts a Class object c can be got 
by method resolution; where the first argument (querylcc (st2.PROGSTATE_lcc) 
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Lemma 1 (correctness of method invocation) 

Vstl st2 refcn refmthd. 

(3c. (IVKRule stl st2) A 

(PC (HD (stl.PROG_STATE^tack)).STK_mthd (HD (stl.PROG_STATE^tack)).STK_pc ^ 
invokevirtual (refcn, refmthd)) A 
(mthd_RESOLUTION 

(querylcc (st2.PROG_STATE_lcc) (HD (stl.PROG_STATE^tack)).STK_cls.Cls_Loader 
refcn). LCC_Class refmthd (st2.PROG_STATE_lcc) ^ c )D 
(EXISTS (eqlcc <|LLC_L1; = (HD (stl.PROG_STATE_stack)).STK_cls.Cls_Loader; 

LLC_L2:— c.Cls_Loader; LLC_classname:— refmthd.Mthd_arg| >) st2.PROG_STATE_llc)) ; 

Lemma 2 (uniqueness of subtyping hierarchy) 

Vcl c2 c3 Icc. (sub cl c2 Icc) A (sub cl c3 Icc) D (c2 — c3); 

Lemma 3 (transitivity of field resolution) 

Vcl c2 Icc c fid. 

(subtc cl c2 Icc) A (fld RESOLUTION c2 fid Icc ^ c) D (fid_RESOLUTION cl fid Icc ^ c); 

Fig. 9. Some lemmas related to the soundness 

(HD (stl.PROG-STATE_stack)).STK_cls.Cls_Loa,der refcn). LCC-Class computes 
the Class object denoted by refcn that is loaded by the defining loader of the 
current class of state stl. Intuitively, formula 3 resolves the method refmthd 
declared in refcn. The conclusion of the lemma asserts there exists a constraint 
<L, L’, N> in the LLC list of state st2; where L is the defining loader of the 
current class in state stl and L’ is the defining loader of the class c got by the 
method resolution. Relation IVKRule : PROG_STATE PROG_STATE bool 
defines the state transition of the instruction invokevirtual. The proof requires 
lemmas of the structure of invokevirtual rule and method resolution relation, as 
well as the definition of loading class constraints. 

Lemma 2 asserts if cl is the subclass of c2 and cl is the subclass of c3 also, 
then c2 equals c3, which implies the uniqueness of subclass hierarchy in the 
loaded class cache list. The proof is mainly by induction on the definition of 
predicate sub and the structure of relations of class loading. 

Lemma 3 asserts if there is a transitive closure of subtyping relation between 
cl and c2, then resolving field fid from c2 can get the same result as that from cl . 
That is, there exists some transitivity between cl and c2 in the field resolution. 
The proof requires lemma 2 and the induction on the definition of recursive 
function M_RES0LUT10N . 

Theorem (Soundness) A well-typed state will still be well-typed if rewritten 
with the relations defined in the model. 

The soundness theorem implies a well-typed program can preserve type safety 
in the model. The proof of the theorem is lengthy. It is a case analysis on all the 
relations. On the top level there are 13 cases, where 

— Six cases can be directly solved by induction on the structure of the relations. 

— The proof of relation invokevirtual requires not only the lemmas on the 
structure of relation invokevirtual and the definition of mthd_RESOLUTION, 
override and wt, but also lemma 1 and related lemmas of other relations. 

“ The proof of areturn requires some lemmas of rule invokevirtual and the 
induction on the structure of rule areturn, as well as the definition of well- 
formed stack frame list. 
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— The proof of getfield can be mainly solved by lemma 3, the lemma of getfield 
and the definition of field resolution. 

— The proof of putfield requires not only lemma 3 and lemmas of field resolution 
but also the lemma of the structure of putfield and the definition of well- 
formed heap. 

— The proof of other rules is similar to the above rules. 



5 Related Work 

Saraswat [17] first published the type spoofing problem and proposed two so- 
lutions to it. Dean [4] has discussed the problem of type safety in class loaders 
from a theoretical perspective. He presents a model of dynamic linking that is 
closely related to Java prove the soundness in PVS. Drossopoulou [5] proposes 
an abstract model for dynamic linking and verification in Java. Their account 
is useful for reasoning about Java source language and their model does not yet 
treat multiple loaders. Jensen [7] gives a formal specification of the dynamic load- 
ing of classes in the Java Virtual Machine and of the visibility of members of the 
loaded classes. However, they define loading and linking abstractly. Moreover, 
there exist inaccuracies in [7]. 

Tozawa [19] proposes a formalization of JVM, which is enough to analyze the 
loading constraint scheme of Java. Tozawa takes an environment to define the 
operational semantics of loading and only formalize the invokevirtual and areturn 
instructions. Moreover, Tozawa does not model loaded class cache explicitly. 

Qian [16] proposes a state transition system to describe the loading in the 
JVM and proves its soundness. The basic differences between theirs and ours 
are: first, we construct and check our model in HOL. Second, the specification 
of some important transitions, such as invokevirtual and LOAD, are totally 
different. For example, Qian does not consider the method overriding in the 
invokevirtual instruction and defining loader in class loading. Also, Qian does 
not consider the delegation of class loading. Third, since our model is totally 
different from theirs, the proof of soundness is totally different from theirs. 

Fong et al. [6] propose a proof linking architecture to uncouple bytecode 
verification, class loading, and linking. They only consider a single class loader. 

Liu et al. [11] present a virtual machine simulator which is implemented in 
a functional subset of Common Lisp. One important feature is their simulator 
can model dynamic class loading, class initialization and synchronization via 
monitors. Another striking feature is that the simulator can be treated as a set 
of formulas in the ACL2 specification language and reasoned about mechanically. 
However, their model does not simulate multiple loaders. Therefore, there is no 
run-time multiple namespaces in their model. In essence, their model of class 
loading is much simpler than the official specification. 

Qian [15] presents a static type system for a large fragment of Java byte- 
code language. To make simplifications, he just assumes all classes are loaded 
by a single class loader. Pusch [14] follows Qian’s work [15] and formalize the 
specification of the JVM in the Isabelle/HOL. 
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Nipkow and Klein have done lots of work on bytecode verification. They 
propose an abstract framework for bytecode verification, which can be instanti- 
ated to yield executable verified bytecode verifiers [12,8, 10]. Klein et al. [9] has 
proved correct a compiler for Java from source to bytecode language in Isabelle, 
and has also shown that all well-typed programs of the source language are ac- 
cepted by the bytecode verifier. Nipkow and Oheimb [13,20] also formalize the 
Java language in the Isabelle/HOL. Their proposals are useful for the reasoning 
of the Java source language. All these work does not consider multiple loaders. 

There are also lots of researches on the verification of Java Card [8, 3,2, 1]. 
Nipkow and Klein [8], using the Isabelle/HOL, formalize and prove the soundness 
and completeness of lightweight bytecode verification used in the KVM, one of 
Sun’s embedded variants of the JVM. Barthe et al. [3, 2] formalize the JavaCard 
virtual machine and the bytecode verification in Coq system. Barthe et al. [1] 
also describe a package to reason about complex recursive functions in Coq. 
They also illustrate how to apply the package to the reasoning of the Java Card 
platform. 

6 Conclusion 

We propose a model for the Java virtual machine. Our model includes the main 
features of the Java class loading and linking. Comparing with the prior work, 
our proposal considers multiple loaders and the concrete implementation of the 
class loading and linking. We show how the notion of class loaders is related to 
the Java security model. Therefore, our model is precise enough to specify and 
reason about class loading formally and most closed to the official specification. 
To ensure the correctness of the model, we formalize it in the HOL system. The 
theory files sums up to more than 2300 lines. To our knowledge, there is no such 
research that formalizes JVM class loading in a theorem prover. The machine- 
assisted proof eliminates the omissions and inaccuracies, such as type errors and 
inconsistencies, in the formalizations. The power of automated reasoning in the 
prover is also of great help. Moreover, the expressiveness of higher order logic 
makes the model more concise. 

We are also working on integrating bytecode verification in our model. Al- 
though there is lots of work on the bytecode verification, almost all of them 
focus on the verification algorithm due to Gosling and Yellin [18]. Also, all the 
previous work does not consider multiple loaders during bytecode verification. 
For Java, a most striking feature is that the verification of type soundness is 
carried out at four different time: compiling, loading, linking and run time. The 
notion of class loader plays a critical role in Java 2 security model, which relies 
on name spaces to ensure that an untrusted applet cannot interfere with other 
Java programs. Therefore, integrating loading, linking and bytecode verification 
in a unified model and prove the type soundness is a major challenge for the 
future research. 
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Abstract. We formalise a simple assembly language with procedures 
and a safety policy for arithmetic overflow in Isabelle/HOL. To verify 
individual programs we use a safety logic. Such a logic can be realised 
in Isabelle/HOL either as shallow or deep embedding. In a shallow em- 
bedding logical formulas are written as HOL predicates, whereas a deep 
embedding models formulas as a datatype. This paper presents and dis- 
cusses both variants pointing out their specific strengths and weaknesses. 



1 Introduction 

Proof Carrying Code (PCC), first proposed by Necula and Lee [14,15], is a 
scheme for executing untrusted code safely. It works without cryptography and 
without a trusted third party. Instead, it places the burden on showing safety 
on the code producer, who is obliged to annotate a program and construct a 
certificate that it adheres to an agreed upon safety policy. The code consumer 
merely has to check if the certificate - a machine-checkable proof - is correct. 
This check involves two steps: A verification condition generator (VCG) reduces 
the annotated program to a verification condition (VC), a logical formula that 
is provable only if the program is safe at runtime. Then a proof checker ensures 
that the certificate is a valid proof for the VC. If both VCG and proof checker 
work correctly, this scheme is tamper proof. If either the program, its annota- 
tions or the certificate are modified by an attacker, they won’t fit or, if they still 
do, the resulting program would also be safe. Proof Checkers are relatively small 
standard components and well researched. The VCG is a different story. In early 
PCC systems it is large (23000 lines of C in [8]) and complex. The formulas it 
produces are usually influenced by the machine language, the safety policy and 
the safety logic. The machine language determines syntax and semantics of pro- 
grams. These are considered safe if they satisfy the conditions the safety policy 
demands. The safety logic can serve multiple purposes. First, it provides a formal 
description language for machine states, which can be use to write annotations 
or to specify a safety policy. Second, it is used to express and prove verification 
conditions. 

For some safety policies, such as checking that all instructions are used on proper 
arguments (type safety), a type system could play the role of the safety logic. 
VCG and proof checking could be replaced by automatic type inference. A typ- 
ical example is Java Bytecode Verification, which is formally verified by now 
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[12]. To handle more complex properties, for example checking that programs 
operate within their granted memory range (memory safety), type systems can 
be combined with a logic or extended to a logic like system [10,13,7]. Foun- 
dational proof carrying code tries prove safety directly in terms of the machine 
semantics [2,3], without a VCG or safety logic as an extra layer. 

Our approach uses a VCG, but keeps it small and generic. We model this VCG as 
part of an Isabelle/HOL framework for PCC, which can be instantiated to var- 
ious machine languages, safety policies and safety logics. The machine checked 
soundness proof we have for this VCG automatically carries over to the instan- 
tiations. One only has to show that the instantiation meets the requirements 
our framework makes explicit. None of these requirements touches the safety 
policy, which in turn can be replaced without disturbing any proof at all. In ad- 
dition Isabelle/HOL supports the whole range of code producer and consumer 
activities. We can generate ML code [5] for our VCG and use Isabelle/HOL to 
produce and check proof objects for verification conditions [6]. 

By now we have instantiated various non trivial safety policies, such as con- 
straints on runtime or memory consumption, and verified various example pro- 
grams, including recursive procedures and pointer arithmetics [1]. In this paper 
we instantiate a simple assembly language (SAL) with a safety policy that pre- 
vents type errors and arithmetic overflows. Both are kept rather simple. This 
paper focuses on the safety logic, which can be embedded in Isabelle/HOL [16] 
either in shallow or deep style. In the first one models safety logic formulas as 
HOL predicates on states. The safety logic automatically inherits the infrastruc- 
ture of the theorem prover such as its type system and tools for simplifying or 
deciding formulas. In the second one models formulas as a datatype and defines 
functions to evaluate or transform them. We discuss both variants and point out 
their specific strengths and weaknesses. 

2 Execution Platform 

Our simple assembly language (SAL) is a down sized version of TAL [13], which 
additionally has indirect jumps, multiple argument passing modes and an ex- 
plicit distinction between registers and heap addresses. Since we are primar- 
ily interested in the safety logic and policy, we rather keep the programming 
language simple. However, with pointers and procedures SAL already includes 
major pitfalls of machine languages. We consider programs as safe if all instruc- 
tion arguments have proper type and do not cause arithmetical overflows. Note 
that the latter involves reasoning about runtime values and demands an expres- 
sive annotation language. A simple type system does not suffice, because it can 
only express what types the results of an instruction or procedure have, not the 
relation between input and output values. 

2.1 SAL Platform 

In SAL we distinguish two kinds of addresses. Locations, which we model as 
natural numbers, identify memory cells, whereas positions identify places in a 
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program. We denote positions as pairs (pn,i), where i is the relative position 
inside a procedure named pn. 

types loc = nat, pname = nat, pos = pname X nat 

SAL has instructions for arithmetics, pointers, jumps and procedures. 

datatype instr = SET loc nat \ ADD loc loc \ SUB loc loc \ MOV loc loc \ 
JMPL loc loc nat \ JMPB nat \ CALL loc pname \ RET loc \ HALT 

These instructions, which we explain in §2.2, manipulate states of the form 
{p,m,e), where p denotes the program counter, m the memory and e the envi- 
ronment. 

types state = pos x (loc tval) x env 

The program counter p is the position of the instruction that is executed next. 
The main memory m, which maps locations to typed values, stores all the data 
a program works on. We distinguish three kinds of values: Uninitialised values 
ILLEGAL, natural numbers NAT n, and positions POS (pn,i). 

datatype tval = ILLEGAL \ NAT nat \ POS pos 

The environment e tracks useful information about the run of a program. It is a 
record with two fields cs and h and equally named selector functions. To update 
a field x in a record r with an expression E we write r(\x:=E\). 

record env = cs (nat x (loc => tval)) list 
h :: pos list 

An environment e contains a call stack cs e, which lists the times and memory 
contents under which currently active procedures have been called, and a history 
h e, which traces the values of program counters. We use the environment like 
a history variable in Hoare Logic. It is not necessary for machine execution but 
valuable for reasoning about execution. We can describe states by relating them 
to former states or refer to system resources, e.g., the length of /i e is a time 
measure. 
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SET B bo, 
SET C Co, 
CALL P 1 , 
ADD B C, 
HALT ]), 

SET M MAX, 
SUB M C, 
JMPL B M 2, 
SET C 0, 
RET P ])] 

Sample Code 



A program is a list of procedures, which consist of 
a name pname and a list of possibly annotated in- 
structions. With ’a option we model partiality in Isa- 
belle/HOL, a logic of total functions. It injects the 
new element None into a given type ’a. 

’a option = None \ Some ’a. 



pair (ins, 
form. To 



types 

proc = pname x ((instr x(form option)) list) 
prog = proc list 

For example Fig. 1 shows a program that safely cred- 
its the balance 5 of a smart card purse. A procedure 
checks whether B + C exceeds MAX. If it does it 
set C to 0 thus preventing an overflow of the fol- 
lowing ADD instruction. For better readability we 
write {A} ins to denote an instruction/annotation 
Some A). Annotations are formulas in the safety logic and have type 
access instructions we write cmd H p, which gives us Some ins if a 
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program 77 has an instruction ins at p, or None otherwise. For example in Fig. 1 
we get cmd OD (0,2) = Some {CALL PI). 



2.2 Program Semantics 

To formalise the effects of SAL instructions, we use the transition relation effS. 
effS LI = {((p,77i,e),(p',m',e')) | step 77 {p,m,e) = Some {p',m\e') } 

For this small step semantics we use step 77 {p,m,e), which yields Some {p\m',e') 
if the instruction at p yields the successor state '). For example ADD 

X Y updates X with (m A)'d^(?7i Y), which is NAT {x+y) if m X=NAT x 
and m Y=NAT y and ILLEGAL otherwise. Here + is addition on naturals (no 
overflow) and lifts operators from natural numbers to typed values. Like all 
instructions ADD also extends the history h e. We formalise this using for 
function update and @ for list concatenation: 

cmd 77 {pn,i) = Some {ADD X Y) — > step 77 {{pn,i),m,e) = 

Some {{pn,i+ l),m{Xi-^m X m Y),e{h\=h e@[{pn,i)]\)) 

The transitions of SUB X Y, which subtracts two numbers, and SET X n, which 
intialises X with NAT n are similar; just replace with ^ or change the 
update to m{X^NAT n). The backwards jump JMPB t jumps t instructions 
backwards. The conditional jump JMPL X Y t expects numbers at X and Y. If 
the first number is less than the second it jumps t instructions forward, other- 
wise just one. 

cmd 77 {pn,i) = Some {JMPB t) — *■ 

step 77 {{pn,i),m,e) = Some {{pn,i—t),m,e(\h-.=h e@[(pn,i)][)) 

cmd 77 {pn,i) = Some {JMPL X Y t) f\ m X = NAT x f\ m Y = NAT y — > 
step IT {{pn,i),m,e) = Some {{pn,i+if x<y then t else l),m,e(\h:=h e@[(pn,7)][)) 

The procedure call CALL X pn pushes the time (length of h e) and the current 
memory onto the call stack, leaves the return position in X and jumps into pro- 
cedure pn. The procedure return RET X pops the topmost entry from the call 
stack and jumps to the return position it expects in X. 

and n {pn,i) = Some {CALL X pn') — > step 77 {{pn,i),m,e) = Some 
{{pn',0),m{X\-^POS {pn,i+ l)),e<\cs\=\{length {h e),m)]@cs e, h\—h e@[(pn,i)]|)) 

cmd 77 {pn,i) = Some {RET X) A m X=POS r — > 

step 77 {{pn,i),m,e) = Some {r,m,e{cs~tl {cs e); h:—h e@[(pn,i)]D) 

The move operation MOV X Y interprets the values at X and Y as locations x 
and y; it copies the value at x to y. 

cmd LI {pn,i) = MOV X Y A m X=NAT x A m Y=NAT y — > 
step {{pn,i),m,e) = {{pn,i+l), m{y^{m x)),e{h-.={h e)@(pn,7)[)) 

Finally, for HALT or in case the premises above do not hold step returns None, 
i.e. cmd IT p = Some HALT — > step 77 {p,m,e) = None. 
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2.3 Safety Logic and Policy 

To notate and prove safety properties of programs formally we use a so called 
safety logic. The essential constituents of this logic are connectives for implication 
and conjunction and judgements for provability h and validity |=. 

J\y\'form list 'form \~::prog => 'form bool 

t^^jw'form => 'form 'form \=::state 'form => bool 

At this point we do not specify the syntax of formulas. This will be done later 
by instantiating 'form in a deep and shallow style. However, we assume that the 
formula language is expressive enough to characterise initial and safe states of 
programs. That is, we assume that one can define functions initF::prog 'form, 
which specifies initial states, and safeF r.prog pos 'form, which yields local 
safety formulas. Together initF and safeF comprise a so called safety policy. A 
program is safe if all states (p,m,e) we can reach from an initial state {po,mo,eo) 
are safe. 

isSafe FI = po mo cq p m e. (po,mo,eo) H ^nitF IF A 

{{po,mo,eo),{p,m,e)) G {effS FF)* — ^ {p,m,e) ^ safeF FF p) 

In this paper we instantiate initF such that it only holds for states (p,m,e) 
where the program counter p is {0,0), the memory is uninitialised x. m x = 
ILLEGAL, the history is empty /i e = [] and the call stack has one entry cs e 
= [{0,m)] containing a copy of the initial memory m. The safety formula for 
a position p, i.e. safeF FI p, will be constructed such that it guarantees safe 
execution of the instruction at p. In our case this means all arguments have 
proper types and numerical results this instruction yields are equal or below 
some maximum number MAX. In other words: the instruction is type safe and 
does not cause an overflow. For example if we have ADD X Y &t program 
position p, the formula safeF IF p demands that variables X and Y have values 
NAT X and NAT y such that x + y < MAX. 



2.4 Verification Condition Generation 

Equipped with J\^, ,^=^, initF and safeF we can define a generic VCG, which 
transforms a given well formed program into a formula, the verification condition 
VC, that is provable only if the program is safe. The VCG soundness theorem 
below expresses this formally: 

vcg :: prog 'form theorem: wf IF A IF \- vcg FF — > isSafe FF 

The wellformedness judgement wf demands that every instruction is annotated 
and that the main procedure has no RET instructions. In our project [I] we 
usually work with a VCG that also accepts programs where only targets of 
backward jumps, entry and exit positions of procedures are annotated. However, 
in this paper we focus on the safety logic and rather keep the VCG simple. 
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vcg n = J\j [initF II [safeF II {ipc II), anF II {ipc 

(map (A p. [map{\ (p'B). A, \safeF II p, anF II p, B] 

{wpF n pp' [safeF H p', anF H p^))) 
{succsF Up)) 

{dome 77))) 

In addition to the instruction and annotation fetch operations cmd and anF 
this VCG uses various other auxiliary functions. With sucesF II p it computes 
the list of all immediate successors p' of a position p paired with a branch 
condition B. This branch condition is expected to hold whenever p' is acces- 
sible from p at runtime. For example assume 77 has at position p={pn,i) an 
instruction that jumps t instructions forward if some condition C holds, or 7 
otherwise. Then we expect sucesF II p to yield two successor positions {pn,i+t) 
and {pn,i+l) with C or its negation as branch conditions, i.e. sucesF 77 {pn,i) 
= [{{pn,i+l),C),{{pn,i+l),^C)]. The function wpF is named after Dijkstra’s 
operator for weakest preconditions. It takes a postcondition Q and constructs a 
formula wpF II p p' Q, that covers exactly those states where the program 77 
can make a transition from p to p' such that Q holds when we reach p ' . 

The verification condition is a big conjunction. There is one initial conjunct and 
one conjunct for each position in the code domain domC 77, which lists the 
positions of all instructions in 77. Hence, the overall size of the VC is linear to 
the program size. The initial conjunct demands that initial states are safe and 
satisfy the initial annotation anF 77 {ipc 77) where ipc 77 denotes the initial 
program counter {{0,0) in our case). The conjunct we get for each position 
p in the code domain demands that a state {p,m,e) that is safe and satisfies 
the annotation at p only has successor states {p',m',e') that satisfy the safety 
formula and annotation at p ' . For example if a position p annotated with A only 
has one successor p' with branch condition B and annotation A' we get this 
conjunct inside the VC: 

yV [safeF 77 p, A, B[ l=^_, wpF n p p' {J\^ [safeF n p', A'[) 

So far we have not defined any of the auxiliary functions nor the safety logic 
and policy. The VCG above is generic. By instantiating the parameter functions 
one can use it for various PCC platforms. We have proven that the soundness 
theorem above holds, if these parameter functions meet some basic requirements. 
The sucesF function has to approximate the control flow graph of a program. It 
can yield spurious successors, but must not forget some or yield invalid branch 
conditions. 

assumption sucesF complete: 

wf n A {p,m,e) € safeo 77 A {{p,m,e),{p',m',e'))G effS 77 — > 

3 77. {p',B) € set {sucesF II p) A {p,m,e) ^ B 

This must only hold for all states in the safety closure safe\j FI, the set of states 
that can can occur in a safe execution of 77. These are the initial states and 
states that are reachable from these by only traversing states that are safe and 
satisfy their annotation. 
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{p,m,e) 1= initF U — > {p,m,e) G safeu n 

{p,m,e) € safea U A {{p,m,e),{p € effS U A 
{p,m,e) 1= safeF F[ p A {p,m,e) ^ anF II p A 

{p',m\e^ ^ safeF II p' A {p',m',e') ^ anF II p' — > (p',m',e') G safea II 

The wpF operator has to be compatible with the semantics of SAL. That is, the 
formula it yields must guarantee the postcondition in the successor state. 

assumption correct WpF : 

wf n A {p,m,e) G safea II A {{p,m,e),{p',m',e')) G effS II A 
ip,m,e) 1= {wpF n p p' Q) — > {p',m',e') h Q 

Another requirement is the correctness of the safety logic. That is, provable for- 
mulas must be valid for all states in safe\j II. 

assumption corrects afety Logic: 

wf n A n A F — > V sG safcu II . s \= F 

Finally, we require that the logical connectives have their ordinary semantics 
and that initF is consistent with ipc. 

assumptions 

s 1= (A 1 ^=^ B) — > s 1= A — > s \= B 
s \= J\^ Fs = y FGset Fs. s ^ F 
{p,m,e) 1= initF II — > ipc II = p 

Note that these requirements are kept very weak in order to allow for a wide 
range of instantiations. With safe^j II in the premisses verifying these require- 
ments becomes simpler; one only has to consider states originating from a safe 
execution. A lot of properties, for example the wellformedness of the call stack, 
can be deduced from this fact. 



3 Shallow Embedding 

3.1 Syntax 

In a shallow embedding logical formulas are written directly in the logic of the 
theorem prover. In our case this means SAL formulas become Isabelle/HOL 
predicates on states. 

type form = state => bool 

We can write arbitrary Isabelle functions from state to bool and use them to 
describe machine states. Typically we do this using A notation. For example 
X{p,m,e). m X = NAT 1 covers all states having the value NAT 1 at memory 
location X. Since we have machine states with enviroments, we can also describe 
states by relating them to former states. For example A(p,?7i,e). m X = (jh e) X 
holds for states, where location X contains the same value as it did at call time of 
the current procedure. Here we use the shortcut m e for the memory at calltime, 
which we can retrieve from the environment e, i.e. m e = snd {hd (cs e)). In 
a similar fashion we can reconstruct the program counter or the environment 
at call time, i.e |T e = (h e)\k and e" e = e(\cs:=tl (cs e); h:=take k (h e)D 
where k = fst {hd (cs e)). The following formulas give some flavour on the style 
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of a shallow embedded formula language. They could be used to annotate the 
example program OD. The function incA increments the offset of a position, i.e. 
incA (pn,i) = {pn,i+l). 

Aq = X{p,m,e). True, Ai = \ {p,m,e). m B = NAT bo, 

A 2 = J\ [Ai, X{p,m,e). m C = NAT co], As = [Ai, 

X{p,m,e). 3 c. m C = NAT c A (c < d — > (c = co A bo + co < MAX))] 

A 4 = X{p,m,e). True 

Asa= X{p,m,e). m P=POS {incA ^e) A 3 b. m B—NAT b A 3 c. m C=NAT c 
"^5 = A l^Sa, X{p,m,e). V®. xAP — > m x=m e x] 

Aoa = X{p,m,e). Va. x^P A x^M — > m x=rh e x 
Ao = Aj [Asa, Aoa, X{p,m,e). m M — NAT MAX] 

j 4 t = Aj [Asa, Asa, X{p,m,e). 3 c. m C = NAT c A m M = NAT {MAX — c)] 
j4s = A [Ar, X{p,m,e). 3 b n. m B = NAT b A m M = NAT n A n<b] 
j4g = X{p,m,e). (y X. x^C A x^M A x^P — > m x = rh e x) A 

{3 b c c' . m B=NAT b A m C=NAT c Am e C—NAT c' A 
{c^ 0 — > {c= c' Ab + c' < MAX)))} 



3.2 Validity 

In the shallow embedding we define validity of formulas simply by application. 
{p,m,e) \= Q = Q {p,m,e) 

3.3 Provability 

The provability judgement h of a logic is usally defined with derivation rules. 
However, since we write formulas as HOT predicates, we can use Isabelle/HOL’s 
built in derivation rules as proof calculus. We consider a formula F provable if 
it is valid for all states in safea: II \- F = Vs.s€ safea II — > s ^ P 

3.4 Weakest Precondition 

The predicate, which wpF II p p' Q yields, computes the successor state for the 
transition from p to p' in program 77 and applies the postcondition Q to it. For 
ADD, CALL and MOV, we define wpF as follows. The remaining instructions 
are analogous. 

wpF LI p p' Q = {case cmd 77 p of None A {p,m,e). False 
I Some a => ease a of ... 

I ADD X Y ^ X{p,m,e). Q {p',m{X ^ {m V)'+'(m T)),e(| h:={h e)@[p]D) 

I CALL X pn => X{p,m,e). Q {p',m{X^POS {pn,0)),e(\h\={h e)@[p]D) 

I MOV X Y ^X{p,m,e). {case m X of ILLECAL False ] POS r ^ False ] 
NAT X {case m Y of ILLECAL => Ealse ] POS r' Palse ] 

NAT y => Q{p',m{y ^ m x),e(\h-.={h e)@[p]D))) ... 
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3.5 Code Generation 

Isabelle can generate ML programs out of executable Isabelle/HOL definitions 
[4]. However, for wpF this code generator does not produce the kind of ML 
program we want. Due to our shallow embedding the code generator also turns 
safety logic formulas into ML programs. Instead we would like them to be han- 
dled as terms of type state => bool. A way out is to enhance the code generator 
by a quotation/antiquotation mechanism. We can introduce functions term and 
toterm:: 'a 'a that are identities for Isabelle’s inference system. For the code 
generator these functions serve as markers: When it generates code for an Isa- 
belle term and steps into a term quotation it treats the following input term 
as output of the currently generated ML program. If inside this mode a toterm 
antiquotation appears, it switches back to normal mode. For example, consider 
the following two Isabelle definitions: 

f = Xn. n + n + n, g = X n. term {toterm n + {toterm {n -I- n))) 

When applied to operand 5 the ML program we get for / would return the 
integer 15, whereas the one for g would return the term 5 + 10. Using this 
mechanism we are able to generate an executable VCG from the definitions in 
Isabelle. 

4 Deep Embedding 

4.1 Syntax 

In a deep embedding we represent logical formulas as a datatype. At leaf positions 
we have expressions. 

datatype expr = V nat \ Lv nat \ ,_tvalj \ Pe \ Rp \ Tm \ 

expr lL expr \ expr expr \ expr expr \ 

L?/ expr ,55 expr {then, expr ^Ise, expr \ Deref expr \ Old expr 

Following Winskel [17] we distinguish two kinds of variables. Program variables 
VI, V 2, .. . denote values we find at specific locations in memory. For example 
V 1 stands for the value we find at location 1 in memory. Apart from these we 
have logical variables Lv 1 , Lv 2, . . . , which stand for arbitrary values that do 
not depend on the state of a program. Quantification will be defined later on 
only for logical variables. Since these are not affected by machine instructions we 
will not have to bother about them when we define the wpF Operator later on. 
Apart from variables we have constants JVAT J^OS {0,1\, JLLEGAL,, ... 
and special identifiers for the current program counter Pc, the return position 
of the current procedure Rp and the system time Tm. These primitives can be 
combined via arithmetical operators and conditionals. To support pointers, we 
have the Deref E expression. It yields the value we find at address a, provided 
E evaluates to NAT a. Finally, we have a call state expression Old E, which 
interprets an expression E in the call state of the current procedure. This enables 
one to describe states by relating them to former states. 
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datatype form = [True, \ ^FalsCj \ J\^ form list \ form form \ form \ 
expr n expr \ expr expr\ expr i<;, expr \ expr vtype \ iV, nat form 

Formulas are either the boolean constants fTru^ and ^FalsCj, conjunctions J\^ 
implications A B or negations In addition we have relational 
formulas E E', E ^ E' or E ^ E' and a type checking formula E j T, where 
T can either be Pos for POS or Nat for NAT. 

datatype vtype = Pos \ Nat 

Finally, we can quantify over logical variables. In ^ v P all free occurences of 
Lv V in E are bound and E is expected. Below we have the annotations for our 
example program, written in this new style. 

Ao = True,, Ai = F i? l=5 NAT 6q, A 2 = f\, [^ 1 , F C l=5 NAT cq,] 

^3 = A [ C- Nat, NAT Q N VC 

A IVB cb F C NAT MAX, F C ,=5 NAT ca] 

A 4 = True,, Asa = Aj ^ >-=5 Rp, V B Nat, F C Nat] 

Asb = iVi ® l 1 {Lv X l=5 nat R) L=b Deref {Lv x) l=5 Old {Deref {Lv x)) 

A^ = Aj [^5a, .^ 5 ^], A^a ~ X J\^ {Lv X NAT ) , 

Cl {Lv X i_=i nat M,)] L==b Deref {Lv x) ci Old {Deref {Lv x)) 

Ae = A l^5a, Aea, FMc5 NAT MAX] 

At = A i^5a, visa, F nat MAX V C], Ag, ^ At 

A9 = Aj ^5“’ iVi X Aj [ci {Lv X nat P,), Cl {Lv x ,_=^ NAT M, 

Cl {Lv x) L=i nat G)] L=b Deref {Lv x) l=b Old {Deref {Lv x)). 

Cl ( F C C5 nat Q) J\]V C Old {V C), V B cb V C N, NAT MAX]] 

4.2 Validity 

We use eval::{nat => tval) => state expr =b tval to evaluate expressions on 
a given state and interpretation for logical variables. Program variables stand 
for memory locations, logical variables are interpreted via L and constants are 
directly converted to values. 

eval L {p,m,e) {V v) = m v, eval L s {Lv v) = L v, eval L s fy, = tv 

The identifer Pc stands for the program counter, Tm for the system time (num- 
ber of executed instructions), and Rp for the return position of the current 
procedure. It evaluates to POS {0,0) if we are in the main procedure or to 
ILLEGAL in case of a malformed call stack. 

eval {p,m,e) Pc = POS p eval {p,m,e) Tm = NAT {length {h e)) 
eval {p,m,e) Rp = {case length {cs e) of 0 ^ ILLEGAL 
I Sue n {case n of 0 => POS {0,0) ] Sue n' POS {incA {pT e)))) 

Arithmetical expressions and conditionals are evaluated recursively. 
eval L s {E Tj E')= {eval L s A) A {eval L s E') 

The cases for cj and are analogous. 
eval L s {ff Eq ^ Ei fhen, E 2 Nse, E 3 ) = 

if {eval L s Eq = eval L s E\) then eval L s E 2 else eval L s E^) 



Certifying Machine Code Safety: Shallow Versus Deep Embedding 315 



With Deref E we fetch the value at position a, provided E evaluates to NAT a. 

eval L (p,m,e) {Deref E) = {case {eval L {p,m,e) E) 
of ILLEGAL ILLEGAL \ POS r => ILLEGAL \ NAT a ^ m a) 

Finally, we evaluate Old E by retrieving the call state from the environment. 

eval L {p,m,e) {Old E) = eval L {jf e,fh efe e) E 

Next, we define the validity of formulas relative to states and interpretations. 

L,s\={Tru^ L,s \= J^alse_, L,s\= J\^ Fs = (V F € set Fs. L,s\=F) 

L,s^F F' = {L,s^F L,s^F') L,s^ ^ {L.s^F) 

L,s\= E(^jE' = {eval L s E = eval L s E') 

L,s|= E T = {case {eval L s E) 

of ILLEGAL False \ POS p ^ T=Pos \ NAT n T=Nat) 

L,s|= E^E' = {case {eval L s E) of ILLEGAL False \ POS r False \ 
NAT X {case {eval L s E') of ILLEGAL False \ POS r' ^ False \ 

NAT y ^ X < y)) 

The case for is analogous to just replace < with <. The meaning 
of iVi ?; F is that F holds irrespective of the interpretation of Lv v. 

L,s 1= iVi w F = (V tw. L{vi-^tv),s \= F) 

4.3 Provability 

Provability is defined in a similar manner as for the shallow embedding. A for- 
mula is considered provable if it holds for all interpretations and states in safe\j. 

7T h F = VL. VsG safea N. L,s h F 

To show provability of a formula, there are two alternatives. One can either ex- 
pand the definition of h and work directly with the inference rules of HOL. This 
makes sense if the code consumer’s logic is HOL (something that the shallow em- 
bedding requires). On the other hand, if the code consumer’s safety logic is more 
specialised, the deep embedding can still model the precise inference system in- 
volved. For example, we have derived suitable introduction and elimination rules 
for our language of formulas that do not rely on A calculus and HOL. However, 
proving with deep embedded inference rules inside Isabelle/HOL turned out to 
be inconvenient. The proof tools are designed to prove HOL formulae not ele- 
ments of a datatype. In addition the AZi elimination rule causes trouble. It says 
that from v F we can deduce F[t/v], which is F with all free occurrences 
of Lv V replaced by some term t. We need a form of substitution that renames 
bounded logical variables in F when they occur as free variables in t. This re- 
naming complicates the correctness proof and turned out to double the size of 
our deep embedding theories. Nevertheless defining and verifying deep embed- 
ded inference rules inside Isabelle/HOL pays off if one wants to use specialised 
tools for proof search and checking. Since Isabelle is generic one can also think 
about instantiating the safety logic as a new object logic. 

4.4 Weakest Precondition 

A big difference between our shallow and deep embedding lies in the definition of 
the wpF operator. In the deep embedding we express the effects of instructions 
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at the level of formulas with substitutions. For these we use finite maps, which we 
internally represent as lists of pairs, e.g. fm = [{1 ,1), {2, 4), {3, 5), {3, 6 )]. Finite 
maps enable us to generate executable ML code for map operations like lookup 
-J,-:: 'a > 'b ^ 'a ^ 'b option^ domain dom:\ {'a \> 'b) => 'a list or range ran:: 
('a > '&) => 'b list. A few examples demonstrate how these operators work fmlO 
= None, fml3 = Some 5, dom fm = [1,2,3] and ran fm = [1,4,5]. Note that a 
pair (x,y) is overwritten by a pair {x,y') to the left of it. For ADD, CALL and 
MOV, we define wpF as follows. The remaining instructions are analogous. 

wpF LI p p' Q = {case cmd LI p) of None J^alsCj [ Some a case a of ... 
I ADD X Y ^ substF [{Tm, Tm ct l),{Pc,J^OS p!,), 

{VX, FAch VY)] Q 

I CALL X pn ^ popCs {substF [(T'?n,T'? 7 i cti JYAT lj),{Rp,J’OS {pn,i+l\), 
{Pc,J’OS p'j),{V X,J’OS {pn,i+l\)] Q) 

I MOV X Y substPtF X Y [{Tm,Tm ch JVAT l,),{Pc,J^OS p')] Q ... 

The substitution function substF:: {expr > expr) => form form is the main 
workhorse for the deep embedding. With substF em F we simultaneously sub- 
stitute expressions of the form V v, Tm, Pc or Rp in a formula F according to 
a finite map em. It traverses F and applies substF em E on all expressions it 
finds. 

substF em fFru^ = {True, substF em Ji^alsCj = ^FalsCj 

substF em {J\^ Fs) = J\^ {map {substF em) Fs) 

substF em {F i F 2 ) = {substF em F i) {substF em F 2 ) 

substF em (ci F) = {substF em F) 

substF em {E T) = {substE em E) T 

substF em (iV, v F) = [f/_j v {substF em F) 

Expressions of the form Lv v or fy are ignored by substE, because they are not 
affected by instructions. Here Winskel’s [17] distinction of program and logi- 
cal variables pays off. In wpF only program variables appear in the expressions 
we substitute in. Hence, we do not have to rename bound (=logical) variables. 
However, a substitution with renaming is useful when one wants to define deep 
embedded inference rules (see §4.3). For the remaining primitive expressions sub- 
stE looks up the expression map and replaces them with their substitute in case 
there is some. Otherwise substE just recurses down the expression structure. 

E=Lv V V E=fy, — > substE em E = E 
E=V V V EG{Pc,Rp,Tm} — > 

substE em E = {ease em[E of None => E j Some E' => E') 

o€{i+i,ljjl*j} — *■ substE em {Ei 0 E 2 ) = {substE em Ei) o {substE em E 2 ) 

substE em {ff Ei ,55 E 2 {then, E 3 ,els^ E 4 ) = {if {substE em Ei) ,55 
{substE em E 2 ) fhen, {substE em E 3 ) ,els^ {substE em E 4 ) 

In case of Deref E, we have to check, whether E evaluates to NAT v where v is 
the location of a variable V v that is substituted by em to some expression E'. 
In this case the Deref E expression needs to be substituted as well. Since the 
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evaluation of E depends on the state we cannot do this statically. A way out is 
to replace Deref E by another expression that incorporates this check, e.g. {if 
E" JVAT y, fhert, E' ^Ise, E" where E" = substE em E and em\,E = Some 
E ' . This check needs to be done for all variables in the domain of em; we fetch 
them with the auxiliary function changedvars. 

V G set {ehangedvars em) = (3E'. em|(E v) = Some E') 

substE em {Deref E) = {let E" = substE em E\ 

res = {foldl {XE'. {v,E'). {ff E" ^ JdAT y) fheri, E' ^Ise, E")) 

{Deref E") {ehangedvars em)) in res) 

In this definition we use the HOL function foldl wich calls its input function 
recursively over a list of arguments. 

foldl / a [] = a foldl f a {xffxs) = foldl f {fax) xs 

Since we use substE only to express the effects of instructions on the current 
state, it ceases to play a role when we come to an expression that refers to an- 
other state. Hence, substE terminates when it reaches an Old expression. 

substE em {Old E) = Old E 

To express the effect of pointer instructions, e.g. MOV X Y, we use the special 
substitution function substPtE:: nat nat {expr > expr) form form. 

It works exactly like substE except that it calls substPtE:: nat nat {expr 

> expr) expr expr, when it encounters an expression. The function sub- 
stPtE is a variant of substE that does additional transformations for variables 
and Deref expressions. When we execute MOV X Y it could be that the target 
location NAT v stored in Y coincides with a variable E w in an expression. In 
this case the value oi V v after the MOV X Y instruction becomes the value 
at the location we find in X. To express this effect we can replace V v with a 
conditional expression. 

substPtE X Y em {V v) = 

lii" JYAT y fhen, Deref {V X) ^Ise, {substE em {V v)). 

For Deref E expressions the same technique can be applied. 

substPtE X Y em {Deref E) = let E"= substPtE X Y em E\ res = . . . 
in {ff V Y ^ E" fheri, Deref {V X) ^Ise, res) 

Finally, we need special formula manipulations for procedure calls and returns. 
Remember that CALL pushes the current state (call state) onto the call stack. 
With popCs the wpF function reverses this effect. After the call a new proce- 
dure is active and expressions of the form Old E have a different meaning. The 
expression Old E evaluated after the call yields the same value as E does be- 
fore (because {p~e',m e', "e e') = {p,m,e) when e'={cs:={{length {h e),m)ff{cs 
e)); h\={h e)@[p]). To ensure validity of some formula Q after the call popCs 
Q replaces occurrences of Old E in Q with E. For RET we do the opposite; we 
replace Old {Old E) with Old E. 
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5 Comparison 

Expressiveness 

In our shallow embedding we use predicates in HOL as assertion language. These 
are more expressive than the deep embedded first order formulas. One can quan- 
tify over functions, use expressions of any type and has direct and unrestricted 
access to the state. For example we verified list reversal [1] in our shallow em- 
bedding, but we found it difficult to do this example in the deep embedding, 
which does not offer expressions for lists. However, this shortcoming could be 
overcome by utilizing a richer assertion language. 

Proof Size 

Shallow and deep embedding differ in the definition of wpF; one uses A abstrac- 
tion, the other substitutions. This difference affects verification conditions and 
their proofs. For example for the transition from (0,2) to {1,0) in our example 
program we get these formulas: 

VCa= f\^[\{p,m,e).True,\{p,m,e).m B=NAT bo A m C—NAT co,X{p,m,e).True] 
{X{p,m,e). J\[X{p,m,e).MAX<MAX ,\{p,m,e).m P=POS {incAjP e) A 
3 b.m B=NAT b A 3 c.m C~NAT c A'i x. x^P — > m x=fh e x] 

{{0,2),m[Pi-^POS {0,3)],e(\h~{h e)@[{0 ,2)]-,cs:={length {h e),m)H{cs e)|))) 

VCi=J\[Trua, A [VBeiJlATbo, V C^=,XIAT colJ'rue,] 

JWXIAT MAX^^JIAT MAX, FOS {0,3)„f^FOS {0,3),V B Nat, 

F C Nat, Vi X ci{Lv x^JSlAT P) i_=^ Deref {Lv x)^Deref {Lv a;)] 

In the formula VC s, which results from the shallow embedding, we have vari- 
ous uncontracted A terms. This is because the VCG does not simplify; it just 
plugs annotations and safety formulas into a skeleton of conjunctions and impli- 
cations determined by the control flow graph. The contraction of these A terms 
is done when we prove them in Isabelle. In VCd, which results from the deep 
embedding, these simplifications are carried out by the substitution function, 
which is executed when we run the VCG. Hence, the proof of VCg involves more 
simplification steps than the one of VCd- For example in VCg we find after j3 
contraction this subformula: 

m[P^POS {0,3)] X = fh e{]h:={h e)@[(d,^)]; cs:={length {h e),m)^{cs e)[) x 

Knowing that x ^ P and the definitions of i-^, fh and record updates, we can 
simplify this to the triviality m x = m x. In VCd this triviality is already ex- 
posed by the wpF operator, which yields Deref {Lv x) ^ Deref {Lv x) in this 
situation. The VC we get for our example program can be proven automatically 
in Isabelle using built in decision procedures for Presburger Arithmetics. The 
latter is required for the formula we get for the transition from {1,4) to {0,4)- 
There we have to show that the Addition operation at {0,3) cannot overflow. 
The resulting proof object for the shallow embedding is about twice the size as 
the deep embedding. Other experiments [1] confirm this fact. 
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Formula Optimisations 

Another advantage we get from the deep embedding is that we can write Isa- 
belle/HOL functions that operate on the structure of formulas. This enables us 
to optimise VCs after/during their construction. Elsewhere [1] we present an 
optimizer for VCs in the deep embedding. It evaluates constant formulas and 
subexpressions, for example ,NAT MAJ^ JVAT MAX^ can be reduced to J'ru^. 
In addition it simplifies implications, for example A [Tru^ or J\^ [. . . ,A,. . . ] 
A, and conjunctions, for example J\^ [. . . ,[Tru ^,. . . ] or J\^ [A, J\^ [B,C],D], 
It can also do some trivial deductions, for example F 6 ,55 JVAT &q, implies V 
b Nat. These transformations, which can be done in time quadratic to the 
formula size, suffice to reduce the size of VCs and their proofs considerably. For 
example VCd can be reduced to !Tru^. Although these optimisations do not 
always trivialise VCs, experiments [1] show that leads to proof objects that are 
about 3 times smaller than they are in the shallow embedding. More could be 
gained by coupling the optimizer to a proof procedure that performs introduc- 
tion and elimination rules on our first order formula language. 

Annotation Analysis 

In the shallow embedding we cannot analyse annotations in the VCG or its 
helper functions. This is because HOL predicates cannot be structurally anal- 
ysed by other HOL functions (Isabelle does not support reflection). In the deep 
embedding the structure of formulas is accessible and can be used to handle more 
complex machine instructions like computed gotos. We simply demand that the 
possible targets of such jumps, which are runtime values and therefore hard to 
determine statically, must be annotated. Then we can define a succsF function 
that reads off the possible successors from the annotation. Since annotations 
must be verified in the resulting VC this approach is sound. 

6 Conclusion 

As we expected the deep embedded safety logic was harder to instantiate within our 
PCC framework than the shallow one. One has to dehne explicit evaluation and substi- 
tution functions and prove them correct. This becomes a non trivial task when variable 
renamings are involved. In addition one has to deal with subtle effects pointer instruc- 
tions or procedure calls have on formulas. However, once the deep embedding is proven 
correct it buys us a lot. We can specify and prove correct an optimiser or pre-prover 
for VCs and handle more instructions (computed gotos). Homeier [11] also works with 
a deep embedded assertion language and points out similar advantages. Based on these 
experiences we instantiated our PCC framework to a down-sized version of the Java 
Virtual Machine [9] using an extended version of our deep embedded assertion lan- 
guage. 
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Abstract. Term algebras have wide applicability in computer science. Un- 
fortunately, the decision problem for term algebras has a nonelementary 
lower bound, which makes the theory and any extension of it intractable 
in practice. However, it is often more appropriate to consider the bounded 
class, in which formulae can have arbitrarily long sequences of quantifiers 
buf the quantifier alternation depth is bounded. In this paper we present 
new quantifier elimination procedures for the first-order theory of term 
algebras and for ifs extension with integer arithmetic. The elimination 
procedures deal with a block of quantifiers of the same type in one step. 
We show that for the bounded class of af mosf k quantifier alternations, 
regardless of the total number of quantifiers, the complexity of our pro- 
cedures is fc-fold exponential (resp. 2k fold exponential) for the theory of 
term algebras (resp. for the extended theory with integers). 



1 Introduction 

The theory of term algebras, also known as the theory of finite trees, axiomatizes 
the Herbrand universe. It has wide applicability in computer science. In pro- 
gramming languages many so-called recursive data structures can be modeled 
as term algebras [19]; in theorem proving it is essential to the unification and 
disunification problem [18, 3]; in logic programming, it is used to define formal 
semantics [14]. Other applications can be found in computational linguistics, 
constraint databases, pattern matching and type theory. 

In this paper we consider an arithmetic extension of the theory of term 
algebras. Our extended language has two sorts; the integer sort Z and the term 
sort TA. Intuitively, the language is the set-theoretic union of the language of term 
algebras and the language of Presburger arithmetic plus the additional length 
function (.)'- : TA ^ Z. Formulae are formed from term literals and integer 
literals using logical connectives and quantifications. Term literals are exactly 
those literals in the language of term algebras. Integer literals are those that 
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can be built up from integer variables (including the length function applied to 
TA-terms), the usual arithmetic relations and functions. This type of arithmetic 
extension has been used in [10,11] to show that the quantifier-free theory of 
term algebras with Knuth-Bendix order is NP-complete. 

Our interest originates from program verification as term algebras can model 
a wide range of tree-like data structures. Examples include lists, stacks, counters, 
trees, records and queues. To verify programs containing these data structures 
we must be able to reason about these data structures. However, in program 
verification decision procedures for a single theory are usually not applicable 
as programming languages often involve multiple data domains, resulting in 
verification conditions that span multiple theories. A common example of such 
"mixed" constraints are combinations of data structures with integer constraints 
on the size of those structures. In [24] we gave a quantifier-elimination procedure 
for this extended theory. 

Unfortunately the theory of term algebras has nonelementary time com- 
plexity [7, 3, 22], which makes the theory and any extension of it intractable in 
practice. However, as observed by many [20, 8], in consideration of the complex- 
ity of logic theories, the meaning of a formula soon becomes incomprehensible 
as the number of quantifier alternations increases. In practice we rarely deal with 
formulae with a large quantifier alternation depth. Therefore it is worthwhile 
to investigate the class of formulae which can have arbitrarily long sequences 
of quantifiers of the same kind while the total number of quantifier alternations 
is bounded by a constant number. We call such formulae alternation bounded. 

In this paper we present new quantifier elimination procedures for the theory 
of term algebras as well as the extended theory with integers. Our procedures 
can eliminate a block of quantifiers of the same kind in one step. For the bounded 
class of at most k quantifier alternations, regardless of the total number of 
quantifiers, the complexity is k-fold exponential (resp. 2k fold exponential) for 
the theory of term algebras (resp. for the extended theory with integers). 

Related Work and Comparison. Presburger arithmetic (PA) was first shown 
to be decidable in 1929 by the quantifier elimination method [6]. Efficient algo- 
rithms were later discovered by Cooper et al [5, 20]. It was shown in [20] and ]8], 
respectively, that the upper bound and the lower bound of the bounded class in 
the theory of PA is one exponential lower than the whole theory. 

The decidability of the first-order theory of term algebras was first shown by 
Mal'cev using quantifier elimination [17]. This result was reproved in different 
settings [16,3,9,2,1,21,13,12,24]. The lower bound of any theory of pairing 
functions was shown to be nonelementary in [7]; this result was strengthened in 
[4] to a hereditarily nonelementary lower bound. This lower bound complexity 
applies to the theory of term algebras as term algebras with a binary constructor 
can express pairing functions. Using techniques in [4], [22] showed that theories 
of finite trees, infinite and rational trees are all hereditarily nonelementary. 

Quantifier elimination has been used to obtain decidability results for var- 
ious extensions of term algebras. [16] showed the decidability of the theory 
of infinite and rational trees. [2] presented an elimination procedure for term 
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algebras with membership predicate in the regular tree language. [1] presented 
an elimination procedure for structures of feature trees with arity constraints. 
[21] showed the decidability of term algebras with queues. [13] showed the 
decidability of term powers, which generalize products and term algebras. [24] 
extended the quantifier elimination procedure in [9] for term algebras with 
length function. 

Traditionally, methods for quantifier elimination for term algebras follow 
one of two approaches: they either perform transformations in the constructor 
language [17, 3, 16, 12], or they work in the selector language [9, 21]. In the first 
approach formulae are reduced to a boolean combination of a specific kind of 
formulae called "solved forms", which include ordinary literals. In this respect 
[3] is essentially a dual of [16] with the special formulae being universally 
quantified. In [12] selectors are used to convert solved forms to quantifier-free 
formulae. In the second approach, formulae are transformed into a form in 
which the quantified variable is not embedded in selectors and only occurs in 
disequalities. Methods following the first approach can deal with a block of 
quantifiers of the same type in one step. They all rely on the "independence 
lemma" ([17], page 277, also see Thm. 1 in this paper) which states that "there 
are enough elements to satisfy a certain set of disequalities and equalities." 
However, this does not hold in the language with finite signature and length 
function. Methods following the second approach can only handle a single 
quantifier at a time. 

Our elimination procedures are carried out in the language with both selec- 
tors and constructors. The method combines the extraction of integer constraints 
from term constraints with a reduction of quantifiers on term variables to quan- 
tifiers on integer variables. 

Paper Organization. Section 2 provides the preliminaries: it introduces the no- 
tation and terminology. Section 3 defines term algebras. Section 4 describes a 
new elimination procedure for the theory of term algebras. Section 5 introduces 
the theory of term algebras with integer arithmetic and presents the technical 
machinery for handling the length function. Section 6 presents the main contri- 
bution of this paper: it expands the elimination procedure in Section 4 for the 
extended theory with integers. Section 7 concludes with some ideas for future 
work. Due to space limitation all proofs have been omitted from this paper. 
They are available for reference in the extended version of this paper at the first 
author's website. 

2 Preliminaries 

We assume the first-order S 5 mtactic notions of variables, parameters and quan- 
tifiers, and semantic notions of structures, satisfiability and validity as in [6]. 
We explain concepts and terminology important to this paper as follows. 

A signature Z is a set of parameters (function symbols and predicate symbols) 
each of which is associated with an arity. The function symbols with arity 0 are 
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also called constants. The set of Z-terms T(L,X) is recursively defined by: (i) 
every consfanf c e Z or variable x e ^ is a ferm, and (ii) if / e Z is an n-place 
function symbol and t\,. . . ,tn are ferms, fhen /(fi, ■ ■ ■ ,tn) is a ferm. If 0 is a 
formula, we use 21 ( 0 ) fo denofe fhe sef of ferms occurring in 0 . The length of a 
ferm t, written len(f), is defined recursively by: (i) for any consfanf a, len(a) = 1, 
and (ii) for a ferm a{ti, . t^), len(a(fi, . . . , t^)) = len(f,) + 1. 

An atomic formula (atom) is a formula of fhe form P{ti, . . . ,tn) where P is 
an n-place predicafe symbol and fi, ...,f„ are ferms (equalify is freafed as a 
binary predicafe symbol). A literal is an atomic formula or its negation. A vari- 
able occurs free in a formula if it is not in the scope of a quantifier. A formula 
without quantifiers is called quantifier-free. A ground formula is a formula with 
no variables. A sentence is a formula in which no variable occurs free. Ev- 
ery quantifier-free formula can be put into disjunctive normal form, that is, a 
disjunction of conjunctions of literals. 

We use X to denote a set of variables, say, xi, . . . , x,„ and 3x (resp. Vx) as an ab- 
breviation of 3xi, . . . , 3x„ (resp. Vxi, . . . , Vx„). When we write 0(x), we mean that 
X occur free in 0. Any formula 0 can be put into prenex form QiXi, . . . , Q„x„ 0(x), 
where Q,'s are either 3 or V and 0(x) is quantifier-free. We call 0(x) the matrix 
of 0. We say that 0 has quantifier (alternation) depth m if Qi, . . . , Q„ can be di- 
vided into m blocks such that all quantifiers in a block are of the same t5q5e and 
quantifiers in two consecutive blocks are different. 

A Z-structure (or Z-interpretation) is a tuple (A,I) where A is a non-empty 
domain and I is a function that associates each n-place function symbol / (resp. 
predicate symbol P) with an n-place function (resp. relation P'^) on A. We 
usually denote 91 by (A; E) which is called the signature of 91. 

A variable assignment cr (or variable valuation) is a function that assigns each 
variable an element of A. We use [[x]|cr to denote the assigned values of x under 
a. We write |[x]j when a is clear from the context. The truth value of a formula is 
determined by an interpretation and a variable assignment. 

A formula 0 is satisfiable (or consistent) if it is true under some variable 
assignment; it is unsatisfiable (or inconsistent) otherwise. A formula 0 is valid 
if it is true under every variable assignment. A formula 0 is valid if and only if 
-10 is imsatisfiable. 

By a theory of structure 9t, written Th(9l), we shall mean the class of all 
valid sentences in 91. We use BCk(9l) denote the subclass of Th(9l) in which all 
sentences have at most k quantifier alternations. 

A theory T is said to admit quantifier elimination if any formula can be equiv- 
alently (modulo T) and effectively transformed into a quantifier-free formula. 
If a theory admits quantifier elimination, then every sentence is reducible to a 
ground formula. Therefore, if ground literals are decidable, then a quantifier 
elimination procedure becomes a decision procedure. 

Presburger arithmetic (PA) is the first-order theory of addition in the arith- 
metic of integers. The corresponding language and structure are denoted, re- 
spectively, by JSfz and 9tz = <Z; 0, +, <). 

We define expg(/(n)) = f{n) and exp^^j(/(n)) = 
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3 Term Algebras 

We present a general language and structure of term algebras. For simplicity, 
we do not distinguish S 5 mtactic terms in the language from semantic terms in 
the corresponding structure. The meaning should be clear from the context. 

Definition 1. A term algebra ?Ita : {TA;Jl,C,S,T') consists of 

1 . TA; The term domain, which consists of all terms built up from constants by applying 
constructors. Elements in TA are called TA-terms. 

1. Jl: A finite set of constants: a,b,c, . . . 

3. C: A finite set of constructors: a, f, y, .. . The arity of a is denoted by ar(n). An 
object is a-typed (or an a-term) if its outmost constructor is a. 

4. S: A finite set of selectors. For a constructor a with arity k, there are k selectors 
s“, . . . , s“ in S. For a term x, s“(x) returns the component ofx ifx is an a-term 
and X itself otherwise. 

5. T: A finite set of testers. For each constructor a there is a corresponding tester Is^. 
For a term x, ISa(x) is true if and only ifx is an a-term. In addition there is a special 
tester ISc such that Isc(x) is true if and only ifx is a constant. Note that there is 
no need for individual constant testers as x = a serves as ISa(x). 

We denote by the language for '21ja. 

Unless mentioned otherwise, in this paper we assume that .ifxA is finite. 
However, the techniques presented here can be modified to handle the case 
of infinite languages. In particular, the decision problems become considerably 
easier if we allow .^fjA to have infinitely many constants. We leave the detailed 
discussion to the extended version of this paper. 

The theory of term algebras is axiomatizable as follows [9]. 

Proposition 1 (Axiomatization of Term Algebras [9]). Let Za bez\,..., Zar{a)- The 
following formula schemes, in which variables are implicitly universally quantified over 
TA, axiomatize Th('llTA). 

A. t(x) + X, if t is built solely by constructors and t properly contains x. 

E. a + b, a + a(x\ . ..,Xar(a)), and a(x\ . ..,Xar(a)) + /S(i/i/ • • • / 3/ar(^))/ fa and b are 
distinct constants and if a and f are distinct constructors. 

C. (XjXi, . . . ,Xar{a)) — Cr(l/i, . . . , l/ar(a)) Al<!<ar(o:) ~ Vi- 

D. ISa:(x) 3 Zaa(Za) = X; ISc(x) ^ AaeC “'■SaW- 

E. sf{x) = y ^ {3za{a{Za) =xAy- zf) V (Sza(a(za) x) A X = y). 

In general selectors and testers can be defined by constructors and vice 
versa. One direction has been shown by (D) and (E), which are pure definitional 
axioms. 

Example 1. Consider the LISP list structure 



%ist = <list; (nil}, {cons}, (car,cdr}, (IScons)) 
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where list denotes the domain, nil denotes the empty list, cons is the 2-place 
constructor (pairing function) and car and cdr are the corresponding left and 
right selectors (projectors) respectively. It is not difficult to verify fhaf ?l|ist is an 
insfance of ferm algebras. 

We use fhe nofafion a = (s“ , . . . , s^ fo mean fhaf a is a consfrucfor wifh 
ar(a) = k and s“, . . . , s^* are fhe corresponding selectors of a. We call a ferm t a 
constructor term (resp. selector term) if fhe oufmosf funcfion symbol of f is a 
consfrucfor (resp. a selector). We assume fhaf no consfrucfor ferm appears inside 
selectors as simplificafion can always be done. For example, sf{a{x\, . . . ,Xk)) 
simplifies fo x, (1 < i < k) and s^(a(xi, . . .,xj^)) simplifies fo a{xi, . . . ,Xk) for 
f>. We use L,M,N,. .. fo denofe selecfor sequences. If L = Si, . . . , s„. Lx is an 
abbreviafion for Si(. . . (s„(x) . . .)). We say a selecfor ferm s“{t) is proper if ISa(f) 
holds. We can make selecfor ferms proper wifh f 5 q)e information. 

Definition 2 (Type Completion). 9' is a type completion of 9 if 9' is obtained from 
9 by adding tester predicates such that for any term s(f) exactly one literal of the form 
ISa(f) (a e C) or Isc(f) is present in 9'. 

Example!. Let a = (s“, s^). A possible t 5 q>e completion for y = s^s^x is y = 
SjS^x A ISa(x) A lsc(S 2 x). Wifh fhis fype informafion we can simplify y = SjS^x 
fo y = SjX by Axioms (D) and (E) in Prop. 1. 

4 A New Quantifier Elimination Procedure for TIiIUIta) 

In fhis secfion we presenf a new quanfifier elimination algorifhm for fhe fheory 
of ferm algebras and show fhaf fhe algorifhm only needs exponential time fo 
eliminate a block of quanfifiers of fhe same kind. The algorifhm works mainly in 
fhe consfrucfor language while using selecfors as auxiliary fools. The algorifhm 
is also fhe basis for fhe eliminafion procedure for fhe exfended fheory presenfed 
in Secfion 6. 

Normal Form. If is well-known fhaf eliminating arbifrary quanfifiers reduces 
fo eliminating exisfenfial quanfifiers from formulae in fhe form 

3x(Ai(x) A...aA„(x)), (1) 

where Afx) {1 < i < n) are liferals [9]. We can also assume fhaf A's are nof of fhe 
form X = f as 3x(x = f A 6(x, y)) simplifies fo 9{t, y), if x does nof occur in t, fo 
3x0(x, y) if f s X, and fo false by Axiom (A) if f is a ferm which is builf solely by 
consfrucfors and properly confains x. 

Nondeterminism. In this paper all transformations are done on formulae of the 
form (1). Whenever we say "guess 9" , we mean to add a valid disjunction V; 
(where 9 is one of fhe disjuncfs) fo fhe mafrix of (1). When we replace 0 by V; Si 
or direcfly infroduce V; Si, if should be understood fhaf an implicif disjunctive 
spliffing is carried ouf and we work on each resulfanf disjuncf in fhe form (1) 
"simulfaneously" . 
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Simplification. For simplicity, in the description of algorithms, we omit tester 
literals unless they are needed for correctness proof. We may also assume that the 
matrix of (1) is type complete and basic simplifications are carried out whenever 
applicable. For example, for a nonempty selector sequence L, we replace Lx + x 
by true and Lx = x by false. Similarly for f(x) + x and f(x) = x where f(x) is a 
term properly containing x. 

Notation. In the algorithm we use the following notation: x denote the set of 
existentially quantified variables; y denote the set of (implicitly) universally 
quantified parameters; s, t, u denote TA-terms; G, H denote (possibly empty) se- 
lector blocks; /, g, h denote index functions with ranges clear from the context; 
numerical superscripts are parenthesized. Index functions are used to differen- 
tiate multiple occurrences of the same variables. 

Note that in each step the algorithm manipulates the formula 3x : 6{x, y) to 
produce a version of the same form (or multiple versions of the same form in 
case disjunctions are introduced), and thus in each step 3x : 0(x, y) refers to the 
updated version rather than to the original input formula. 

Definition 3 (Solved Form). We say 6ji\,{x, y) is in the solved form (with respect to 
x), ifx are not in equalities, not asserted to be constants and not inside selector terms. 
We say 3x djAix, y) is in the solved form if 6 ta{x, y) is. 

The elimination goes as follows. A sequence of equivalence-preserving trans- 
formations will bring the input formula into a disjunction of formulae in the 
solved form which have solutions under any instantiation of parameters. There- 
fore, the whole block of existential quantifiers 3x can be eliminated by removing 
all literals containing x in the matrix. 

Algorithm 1. Input: 3x : 6{x, y). 

1. Type Completion. Guess a type completion of 6{x, y) and simplify every selector 
term to a proper one. 

1. Elimination of Selector Terms Containing x. Replace all selector terms contain- 
ing X by the corresponding equivalent constructor terms according to Axiom (E). 
For example s“x = y becomes 3z2 , . . . , Zta{y, zf) = xfor ar(n) = k. It may 
increase the number of existential quantifiers, but leaves parameters unchanged. 
From now on, x never appear inside selector terms. 

3. Elimination of Equalities befween Constructor Terms. Replace 

a{h,...,t,) = a{t[,...,f) (2) 

by h = t'l- Repeat untii no equality of the form (2) appears. 

4. Elimination of Disequalities between Constructor Terms. Replace 

a(ti,...,ti) a(t[,...,t'f (3) 

by Vi<i<fc fi T t'.. Repeat until no equality of the form (3) appears. At this point we 
may assume that each disjunct (that has not been simplified to false) is in the form 

3x : [ f\xf(^i) + ti(x, y) A /\G;y^(,-) sfx, y) A l\Hiyhq) = ufx, y)\ 

i i i 



( 4 ) 
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5. Elimination of Equalities Containing x. Solve equations of the form Hiy^q = 
Ui{x, y), where ufx, y) is a constructor term containing x, in terms ofHjyhq) such 
that the result is a set of equations in the selector language. For example, with 
a = (Sj, s“), the solution set of s“y = a{a{x\, \ji), yf) is 



Solving /\iHiyh(i) - ufx, y) and eliminating all x's occurring in solved equations, 
we obtain 



3x : [ /\xfm(i) + tf\x, y) A /\ cfhjgmq) + sf{x, y) A 

i i 

/\HfV,<2,(,)=HfV,<3,(J. (5) 



6. Elimination of Constants. If for some x e x, Isc(x) appears in (5), we instantiate 
X to each constant to eliminate 3x. We still use (5) to denote the resulting formula. 

7. Elimination of Quantifiers. Rewrite A; GfVj( 2 )(o ^ sf\x, y) as 



3x : [ /\ xy( 2 )(,-) + tf\x, y) A /\ GfVj(3)(,) sf{x, y)] A 

i i 

/\ Cfy^^y^ + sf{y) A HfV« 2 ,(,-) = HfV«3,(;). (6) 





where x do not appear in sf\y). Then (5) can be rewritten as 



We claim that 



: [ A ^ tf\x, y) A A GfVjWW * sf\x, y)] (7) 



is valid and hence (6) is equivalent to 




( 8 ) 



Theorem 1. All transformations in Alg. 1 preserve equivalence. 
Theorem 2. Alg. 1 eliminates a block of quantifiers in time 
Theorem 3. BCk('JtTA) is decidable in 0(expj.(n)). 
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5 Term Algebras with Length Function 

In this section we introduce the extended theory and present the technical 
machinery needed to handle lengths of TA-terms in the elimination procedure. 

Definition 4. The structure of the extended language is 31^ = ('lliAl'ilz; (•)'' • TA — > 
Z) where 31ja is a term algebra, 'llz is Presburger arithmetic, and denotes the length 
function; for a term t, = len(t®TA). We denote by the language for '21^. 

We call terms of sort TA (resp. Z) TA-terms (resp. integer terms), similarly 
for constants, variables, quantifiers and formulae. We also use "term" for "TA" 
when there is no confusion. A TA-term can occur inside the length function. We 
call this type of occurrence integer occurrence to distinguish it from the normal 
term occurrence. 

If t is a set of TA-terms, we use to denote the set of all integer occurrences, 
in the context, of the form (Lt)^ where t ^ t and L denotes a (possibly empty) 
block of selectors. 

Example 3. The formula 3x3y -.Tk {x y A x'- = y^) states that there exists at 
least two distinct terms t\, t 2 e TA such that len(ti) = Ien(t2). Note that the first 
occurrence of x is an ordinary term while the second one is integral. The same 
for the occurrences of y. 

Instead of writing n = f- to indicate the connection between term vari- 
ables and the corresponding integer variables, we abuse the notation a bit 
by using f- as formal variables directly in Presburger formulae. For example, 
3x^:Z6z(x^) 3 x:TA0ta(x) stands for Vx'-:Z[^0z(x'-) — > 3x:TA0TA(a:)],which 
in turn is a shorthand for Vn :Z j^0z(n) — > 3x: TA (0 ja(^) A n = x'-)j. 

5.1 Counting Constraints 

As before, to eliminate 3x from 3x : TA 6ja(x, y), we first put 3x : TA 0ta(^/ y) 
into solved form. However, this alone does not suffice as the constraints on the 
lengths of x may restrict the solution set of x. 

Example 4. The truth value of 3xi 3x2 : TA (xi X 2 A Xj = x^ = 3) depends on the 

existence of two distinct terms of length 3. 

Hence we need to know the number of distinct TA-terms at certain length. 

Definition 5 (Counting Constraint). A counf/ngconsfra/nffsfl predicate CNT^*„(x) 
(k > 0,n > 0) that is true if and only if there are at least n + 1 different a-terms of 
length X in ^ta ruith k constants. CNTi;^„(x) is similarly defined with a-terms replaced 
by TA-terms. 

Examples. For2I^j = ('21|ist;'2lz)withoneconstant,CNT“'^®(x)isx > 2m-lA2 f m 
where m is the least number such that the m-th Catalan number Cm = m^m-i) 
is greater than n. This is not surprising as Cm gives the number of binary trees 
with m leaves (that tree has 2m - 1 nodes). 

Lemma 1 ([24]). CNT^^(x) and CNTjt,„(x) are expressible in Presburger arithmetic. 
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5.2 Equality Completion 

Often formulae do nof have all fhe information required fo consfrucf counfing 
consfraints. Consider fhe formula 3x :TA(i/i + x K\j 2 + x K\j\ + 1 / 2 )- Wifhouf 
knowing equalify relations between the lengths of x, y\ and y 2 , we can nof find 
the integer constraint on the length of x. So in order fo construcf counfing con- 
sfrainfs, we need equalify information between ferms and equality information 
between lengths of ferms. 

Definition 6 (Equality Completion). Let Shea set ofJA-terms. An equality com- 
pletion dofS is a formula consisting of the following literals: for any u,v e S, exactly 
one ofu = v and u + v, and exactly one ofu^ = and are in 6. 

Let 0 be a conjunction of liferals. We say thaf 6' is an equalify complefion of 
6, if 6' is a conjunction of an equalify complefion of Li(0) and tesfer liferals in 6. 
We are only interesfed in compatible equalify complefions, i.e., 6 is a subformula 
of 0'. 

Example 6. Lef ar(a) = 2 and Obey a{x,z) A ISa(i/), fhenZ(0) = (x, y,z,a{x,z)\. 
A possible equalify complefion of 0 is 

ISc(i/) A !/'■ = (a(x,z))'" A x'" = z'" A y'- x'- A ^ t + t'. (9) 



5.3 Clusters 

Equality completion is an expensive operation and it is hard to maintain if fhe 
subsequenf operafions generafe new ferms (as in Alg. 6). Revisiting 3x : TA ( 1/1 
X A 1/2 + X !\y\ + y 2 ), if is easily seen fhaf we need fo know whefher y\ = 1/2 
or nof only if we have guessed x^ = y^ =■ y\^. In facf if suffices fo have the 
equality information between terms of the same length. This leads to the notion 
of clusters. 

Definition 7 (Clusters). Let [f] denote the equivalence class containing t with respect 
to term equality. We say that C = {[fo], . . . , [t„]] is a cluster ifto,..., t„ are pairwise 
disjoint terms of the same length. 

For notation simplicity we may assume that a cluster is a set with each mem- 
ber being an (arbitrarily chosen) representative of the corresponding equiva- 
lence class. A cluster is maximal if no supersef of if is a clusfer. A cluster C is 
closed if C is maximal and for any maximal C', C n C' 7 ^ 0 C = C' . Two disfincf 
closed clusfers are said to be mutually independent. A cluster is a-typed (called 
a-cluster) if all of its elements are a-typed. The notions of maximalify, closed- 
ness and mufual independence nafurally generalize fo typed clusters. Note that 
an unt 5 rped maximal cluster may contain more than one typed maximal cluster. 
The size of a cluster is the number of equivalence classes in it. The rank of a 
cluster C, written rk(C), is the length of its terms. Clusters are partially ordered 
by their ranks. 
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Example?. In Ex. 6, formula (9) induces two mutually independent clusters 
Cl : {[x], [z]} and C2 : \[y\,[a{x,z)\] with C2 be n-typed and rk(Ci) < rk(C2). In 
fact any equality completion induces a set of mutually independent clusters. As 
another example, the formula 

x + y/\xi^z/\x^ = y^/\x^=z^/\ ISa(x) A ISa(y) 

gives two maximal clusters : [x, y\ and Cj : (x, z}, with be a n-cluster. 
However, neither Cj nor Cj is closed and their ranks are incomparable. 

5.4 Length Constraint Completion 

In general, formulae are of the form 3x : TA (0 ta(x, y) A 0z(x'", where 
the lengths of x have been constrained by 6z{x^,y^)- For the construction of 
accurate length constraints for x, we need to make 0 z(x'-, y'-) "complete" in the 
sense defined below. 

Definition 8 (Length Constraint Completion). Let 0ta(x, y) be a formula 
and 6z{x'-,y'-) be a formula of .Sfz- Write 0ta(x, 1 /) as 6 ^^(x, j/) A 6^^\y) such that 
d^^fy) does not contain x. We say a formula 0z(x'-, y^) is a completion ofdz{x^, y^) 
in X with respect to 0 ta(x, y) if the following formulae are valid: 

I. 'iy : TA Vx : TA [0ta(x, y) A 0z(x'-, y'-) 0ta(x, y) A 0z(x'-, y'-)\ 

II. 'iy : TA Vx'- : Z A 0z(x'", y^) ^ 3x :TA (0ta(x, y) A 0z(x'", I/'"))]- 

Examples. Let ar(n) = 2, x = {xi,X2,X3}, 1 / = 0 , 0ta(xi,X 2,X3) be n(xi,X2) = X3 
and 0z(Xj,X2,Xj) be Xj < Xj A x^ < Xj. Consider the following formulae: 

0z : Xj + X 2 + 1 = X 3 A x^ > 0 A X 2 > 0, 

0^ : Xj < X 3 A X 2 < Xj A Xj > 0 A X 2 > 0, 

0^ : X 3 + X 2 -(- 1 = Xj A X 3 > 5 A X 2 > 5. 

It is not hard to argue that 0z is a completion of 0z(Xj, x^, Xj) in x with respect 
to 0 ta(xi,X 2 ,X 3 ). However neither 0^ nor 0^ is such a completion. Though 0^ 
satisfies [I], it does not satisfies [II], as the assignment {Xj = 3, x^ = 3, x^ = 4} 
can not be realized by any assignment for x. On the other hand, 0^ satisfies [II], 
but not [I], as the assignment [xi = a, X2 = a,xo, = a{a,a)\, where a is a constant, 
falsifies 0 ^. 

For the construction of length constraint completion, we require that 0ja (x, y) 
A0z(x'", y^) induce a set of closed clusters and be in a special form defined below. 

Definition 9. We say 0ja(x, y) A 0z(x'", y^) is in strong solved form (with respect 
to x) if 0 ta(x, y) is in solved form and all literals of the form Ly + t{x, y), where y ^ y 
and t(x, y) is a constructor term (properly) containing x, are redundant. 
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Example 9. In Ex. 6, formula (9) is not in strong solved form. However, it can be 
made into strong solved form by adding s“y ^ xor Sji/ z. 

The following predicates are needed to describe the construction algorithm: 
Tree(t) : 3x \, . . . , x„ > 0 ( t'- = ( E"=i + 1), 

Tree“(f) : 3t,,(Node“(t, A Tree(h)), 

where ta stands for h,. . . , tar(«) and are the distinct arities of construc- 

tors. The predicate Tree(t) is true if and only if is the length of a well-formed 
TA-term. The predicate Node“(f, ta) forces the length of an a-term with known 
children to be the sum of the lengths of its children plus 1 . The predicate Tree“(t) 
states the length constraint of a well-formed a-term. 

Algorithm 2 (Length Constraint Completion). Input: 6ja{x, y) A 0z(x'-, y'-), 
where 6 ja{x, y) is a conjunction of literals in JfjA und 6z{x^, y^) is a conjunction 
of literals in .jZz. Initially set &z{x^,y^) = 6z{x^,y^)- For each term t occurring in 
6ta{x, y), add the following to Ozix'-, y^)- 

a. t^ = 1, if t is a constant. 

b. t^ = sS if t = s. 

c. Jree{t), if t is untyped. 

d. Tree‘S (t), if t is a-typed. 

e. Node“(t, ta), ift is a-typed with children ta. 

f. CNTi:^„(t'-), ift occurs in an untyped clusters of size n + 1 and 'IIja has k constants. 

g. CNT^^(t'-), ift occurs in an a-cluster of size n + 1 and 3Ija has k constants. 

Lemma 2. If 6ja{x, y) A 6z(x^, y^) is in strong solved form and induces a set of 
mutually independent clusters, then &z{x^, y^) computed by Alg. 2 is a completion of 
dzix'-, y'-) in x with respect to 6ta{x, y). 

Lemma 3. Aig. 2 computes 0z(x'-, y'-) in time 0{n). 

6 A New Quantifier Elimination Procedure for Th(5M^) 

In this section we expand Alg. 1 to an elimination procedure for Th('H^). Since 
has two sorts, namely Z and TA, we need to show elimination of integer 
quantifiers as well as term quantifiers. 

6.1 Eliminate Quantifiers on Integer Variables 

We assume that formulae with quantifiers on integer variables are in the form 

3z :Z [dz{x'-, y,z) A 0ta(:c)), 



( 10 ) 
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where y, z are integer variables and x are term variables. Since 0 ta(^) is in .ifrA, 
we can move 6 ta(^) out of the scope of 3z, obtaining 

3z :Z 0z(A y,z) A 0taW- (11) 

Now 3z : Z y, z) is essentially a Presburger formula and we can proceed 

to remove the block of existential quantifiers using Cooper's method [5,20]. In 
fact, we can defer the elimination of integer quantifiers until all term quantifiers 
have been eliminated. 

6.2 Eliminate Quantifiers on Term Variables 

We assume that formulae with quantifiers on term variables are in the form 

3x : TA {d-[^{x, y) A Wz{x'-, y'-, z)), (12) 

where x, y are term variables, z are integer variables, and Wz{x^, y^, z) is an 
arbitrary Presburger formula. The following algorithm is based on Alg. 1. To 
save space, we do not list W’^{x^, y^, z) until needed. 

Algorithm 3. Input: 3x :TA (0ta(^/ y) A W'^{x'-, y^, z)). 

Run Alg. 1 up to Step [7], Apply the following subprocedures successively unless noted 
otherwise. 

1. Equality Completion (Alg. 4). 

2. Elimination of Equalities Containing x (Alg. 5). 

3. Propagation of Disequalities of the Form Ly + t(x, y) (Alg. 6). 

4. Reduction of Term Quantifiers to Integer Quantifiers (Alg. 7). 

The purpose of steps 11]- [3] is to transform (12) to a formula in strong normal form 
which induces a set of mutually independent clusters. Therefore by Alg. 2 we can con- 
struct the length constraint completion for x which allows us to reduce term quantifiers 
to integer quantifiers. 

Algorithm 4 (Equality Completion). We assume the input formula is in the form 
(renaming the first part of (7)) 

: TA [ Xfq) tfx, y) A /\ sfx, y)], (13) 

i i 

where t[, Sj are: (i) quantified variables x, (ii) parameters y, (in) selector terms of 
parameters in the form Ly (y e y), (iv) constants in Jl, or (v) constructor terms built 
from terms in (i)-(iv). Let S be all terms including subterms which appear in (13). Guess 
an equality completion ofS. It is easily seen that an equality completion is of the form 
(omitting integer literals) 

3x : TA [ Xfq) tfx, y) A /\ Lpjgq) + sfx, y) A 

i i 

A ys'ii) = y)] ■ ( 14 ) 
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Algorithm 5 (Elimination of Equalities Containing x). Let 8 denote the set of 

equalities containing x. Exhaustively apply the following subprocedures until 8 is 
empty. Pick an E e 8. 

A. E is X - u. Then we know x does not occur in u and hence we can remove 3x by 
substituting ufor all occurrences ofx. 

B. E is Ly = a(ti{x, y),..., h{x, y)). Then replace E by 

SjLy = ti(x, y), S^Ly = tk(x, y). 

C. E is f{ui{x, y),..., ufx, y)) = f{u'fx, y),..., u'fx, y)). Then replace E by 

ui{x, y) = u'fx, y), ufx, y) = u'fx, y). 

Algorithm 6 (Propagation of Disequalities of the Form Ly t{x, y)). We only 
need to propagate those disequalities of the form Ly + t{x, y) such that (Ly)^ = {t{x, y))^ 
and t{x, y) is a constructor term (properly) containing x. This is done by the following 
sequence of disjunctive splittings. 

Let ID denote the set of disequalities of the above form. Exhaustively apply the 
following subprocedures until D is empty. Pick D \Ly + a(ti(x, y),. tk(x, y)) e D. 

A. Disequality Splitting. Remove D from D and add to 6ja{x, y) 

-^\Sa{Ly) V \y sjLy + tfx, y). 

l<i<k 



Return if we take -^\Sa(Ly); continue otherwise. 

B. Length Splitting. Suppose we take s“Ly tfx, y) (1 < j < k). Split on 

(s“Ly)'- = (tjix, y))'- V (s“Ly)'- ^ (tfx, y))'-. 

Return if we take (s“Ly)'- {tfx, y))^; continue otherwise. 

C. Equality Spiitting. Suppose the cluster oftfx, y) contains uq,..., u„. Split on 

\y SyLy = M,- V /y SyLy Ui 

i<n i<n 

In case we take any disjunct s“Ly = return ifui does not contain x; rerun Alg. 
5 otherwise. Note that Alg. 5 can only be rerun finitely many times as each run will 
remove at least one existentially quantified variable. 

The last case is that we choose A/<h 5“ Ly This in general will increase the 

size ofD if some ofufxfs are also constructor terms containing x. However if this 
happens, ufxfs will sit in a cluster whose rank is lower than that of the cluster of 
a{ti{x, y),. h{x, y)). As the rank ordering is well-founded, eventually the size of 
D will decrease. 
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Algorithm 7 (Reduction of Term Quantifiers to integer Quantifiers). Omit- 
ting the redundant disequalities of the form Ly + t{x, y), we may assume the resulting 
formula be 

3x : TA y) A d^jliy) A 6z{x'-, y'-) A Wz{x'-, y'~, z)], (15) 

where 6jl{x, y) is of the form ^ U(x, y), Qjlfy) does not contain x, 0z(x'-, y'-) 

is the integer constraint obtained from Algs. 4, 6 (Step [B ]), and Wz.{x^, y^, z) is the PA 
formula not listed before for simplicity. Now let 0ta(^/ y) denote 6jl(x, y) A Ojlfy). 
Call Alg. 2 to get the completion 0z.(x'-, y^) ofO^.ix'-, y'-) in x with respect to 6ja{x, y). 
Now we claim that (15) is equivalent to 

3x : TA y) A 0^^(i/) A 0z{x'-, y'-) A Wz(x'-, y'-, z)], (16) 

which in turn is equivalent to 

3x^ : Z [e^^(y) A 0z(xA y^) A Wz(x^, i/S z)]. (17) 

Lemma 4. Algs. 4,5 and 6 produce a formula in strong normal form which induces a 
set of mutually independent clusters. 

Theorem 4. All transformations in Alg. 3 preserve equivalence. 

Theorem 5. Alg. 3 eliminates a block of quantifiers in time 
Theorem 6. BCk('llj^) is decidable in 0 (exp 2 j.(n)). 

7 Conclusion 

We presented new quantifier elimination procedures for the theory of ferm al- 
gebras and for fhe exf ended fheory wifh Presburger arifhmefic. The eliminafion 
procedures deal wifh a block of quantifiers of fhe same f5qte af one sfep. The 
complexify of one-sfep eliminafion is exponenfial (resp. double exponential) for 
fhe fheory of ferm algebras (resp. for fhe fheory of ferm algebras wifh infegers). 

The double exponenfial complexify is due fo fhe propagation of liferals of 
the form Ly + t(x, y) in a clusfer. We believe fhaf more refined lengfh consfrainf 
consfrucfion will remove fhis cosfly operation. 

We plan fo apply fhese mefhods fo fhe firsf -order theory of queues [21] and 
fo fhe firsf -order theory of Knufh-Bendix order [23]. 
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